AMD's Lemonade v10.8 Adds MCP Support, Letting Claude Desktop and Cursor Route Tasks to Local AMD GPUs

AMD-backed Lemonade v10.8, released June 17, now exposes a Model Context Protocol server, letting Claude Desktop, Cursor, and GitHub Copilot route inference tasks to local AMD Ryzen AI NPUs, Radeon GPUs, or plain CPUs — no cloud API required. The update also adds Moonshine speech-to-text, expanded R

AAAla SMITH & AI Research Desk·4d ago·4 min read··20 views·AI-Generated·Report error

Source: news.google.comvia gn_mcp_protocolWidely Reported

How do I use AMD's Lemonade AI server with Claude Code via MCP?

AMD's Lemonade AI server now supports the Model Context Protocol (MCP), enabling Claude Code to connect to local AI inference on AMD GPUs (ROCm). Install the MCP server, configure it in CLAUDE.md, and use local models for code generation and analysis without cloud dependency.

TL;DR

Lemonade v10.8 brings MCP server integration to AMD's free local AI stack, making Claude Desktop, Cursor, and Copilot able to offload inference to on-device hardware.

AMD's open-source Lemonade AI server hit version 10.8 on June 17, and the headline change is practical: the server now exposes a Model Context Protocol (MCP) endpoint, allowing MCP-compatible AI coding clients — including Claude Desktop, Cursor, and GitHub Copilot — to treat locally-running models as callable tools.

The result is that developers with AMD hardware can now route privacy-sensitive or high-volume tasks to local models without breaking out of their existing agent workflows. Bulk classification, on-device audio transcription, and image generation become free and offline, handled by Lemonade rather than a cloud API.

What Lemonade Is — and What v10.8 Changes

Lemonade is an AMD-backed, open-source local inference server that runs LLMs, image generation, speech-to-text, and text-to-speech entirely on local hardware. It uses llama.cpp for GGUF model inference with Vulkan or ROCm GPU acceleration, and OnnxRuntime GenAI for NPU-accelerated workloads on Ryzen AI 300 series devices.

Prior to v10.8, Lemonade already offered an OpenAI-compatible API at localhost:13305 — meaning any app built against OpenAI's API could point at it. The MCP layer added in v10.8 goes further: it registers Lemonade as a tool server that agents can discover and call dynamically, rather than requiring a manual base-URL swap.

MCP-compatible clients confirmed to work with v10.8:

Claude Desktop
Cursor
GitHub Copilot (VS Code)
Any client implementing the MCP open standard

The update also adds Lemonade Omni, a multimodal routing layer that exposes a single endpoint for chat, transcription, and image generation — the client sends a request and Lemonade directs it to the correct backend automatically.

Key Facts

Version: Lemonade v10.8, released June 17, 2026
New: MCP server integration, Lemonade Omni multimodal router
Also new: Moonshine speech-to-text model support; ROCm expanded to cover the GFX1152 / Radeon 860M iGPU; experimental NVIDIA GB10 ARM64 support
GPU support: AMD Radeon RX 7000 and 9000 series, Ryzen AI 300 series NPUs, integrated GPUs (Radeon 780M/760M/740M), CPU fallback
Install: pip install lemonade-sdk or the .msi installer from lemonade-server.ai
API port: localhost:13305 (OpenAI-compatible); MCP endpoint is separate and auto-registered
License: Open-source (GitHub: lemonade-sdk/lemonade)

Getting Started: Connecting Claude Desktop via MCP

The canonical setup for an MCP client is to register Lemonade as a server in the client's config. For Claude Desktop, that means adding a Lemonade entry to claude_desktop_config.json; for Cursor or Copilot, the MCP server settings panel handles it. AMD's documentation and the Hugging Face MCP course unit both provide step-by-step walkthroughs.

Start the server:

pip install lemonade-sdk
lemonade-server serve

Then point your MCP client at Lemonade's MCP endpoint — the exact JSON config varies by client but follows the standard MCP server registration format.

Why This Matters Now

The practical value of Lemonade's MCP support is task routing. Frontier model sessions — Claude, GPT-5, Gemini — are excellent at reasoning but expensive per token. Tasks like transcribing a meeting recording, classifying a batch of documents, or generating boilerplate code are often handled just as well by a local 7B or 8B model at zero marginal cost.

With MCP as the bridge, an agent running in Claude Desktop or Cursor can now delegate those sub-tasks to Lemonade without the user doing anything — the orchestrating model decides, calls Lemonade's MCP tools, and returns the result inline.

This matters specifically for:

Developers with proprietary codebases who cannot send source files to cloud APIs
Teams processing high volumes of repetitive inference tasks (test generation, doc comments, classification)
Air-gapped or travel workflows where connectivity is unreliable

AMD's positioning here is also notable. CUDA has long been the path of least resistance for local inference; Lemonade's ROCm and Vulkan backends, now bridged into the MCP ecosystem, give Radeon hardware a credible role in agentic developer tooling — a segment NVIDIA has dominated.

What to Watch

The Hugging Face MCP course already includes a dedicated Lemonade unit, suggesting growing ecosystem investment. Whether Claude Code (the CLI agent) gains native lemonade launch claude integration — which was added in v10.0 for direct launch workflows — extending to MCP-based tool routing remains the next logical step to watch in upcoming Lemonade releases.

Source: Phoronix — AMD's Lemonade AI Server Now Much More Useful With MCP Server Integration

Source: gn_mcp_protocol

Source: gentic.news · 4d ago · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

**What should Claude Code users do differently?** First, if you own an AMD GPU, install Lemonade AI and enable its MCP server today. This gives you a local inference fallback that works with Claude Code—no cloud dependency. Add the MCP server URL to your CLAUDE.md so it's always available. Second, use this for code generation tasks that don't require Claude's full reasoning power. For example, boilerplate generation, simple refactors, or linting suggestions can be offloaded to the local model, saving your Claude Code API budget for complex tasks. Third, if you're in a team with mixed hardware (NVIDIA and AMD), standardize on MCP. Both CUDA and ROCm MCP servers expose the same interface, so Claude Code works identically regardless of the underlying GPU. Update your onboarding docs to include MCP server setup for both ecosystems. Finally, monitor the Lemonade MCP server's performance. If you notice latency, try smaller quantized models or adjust the batch size. The MCP protocol allows multiple concurrent requests, so you can parallelize simple tasks.

#claude code #hardware #amd #mcp #local ai

This story is part of

Claude Code's Campus Conquest Flips Anthropic's Talent Pipeline, Leaving Google's Academic Edge in Doubt

Viral adoption at MIT and Stanford transforms Claude Code from product into recruiting funnel, threatening Google's long-held research talent dominance

Compare side-by-side

Model Context Protocol vs ROCm

→

Mentioned in this article

Model Context Protocol Lemonade AI Server Claude Code AMD ROCm

Enjoyed this article?