How to Run Claude Code on Local LLMs with VibePod's New Backend Support

How to Run Claude Code on Local LLMs with VibePod's New Backend Support

VibePod now lets you route Claude Code to Ollama or vLLM servers, enabling local model usage and cost savings.

6h ago·3 min read·2 views·via hn_claude_code
Share:

How to Run Claude Code on Local LLMs with VibePod's New Backend Support

VibePod, the development environment tool, has added a crucial feature for Claude Code users: the ability to connect Claude Code agents to external LLM servers. This means you can now run Claude Code against open-source models served by Ollama, vLLM, or any compatible endpoint.

What Changed — Direct LLM Routing for Claude Code

Previously, Claude Code only worked with Anthropic's official API. Now, VibePod can intercept and redirect Claude Code's API calls to your own LLM servers. The system exposes environment variables that make Claude Code think it's talking to Anthropic's servers while actually connecting to your local or remote model endpoints.

What It Means For You — Local Models, Cost Control, and Privacy

This update gives you three immediate advantages:

  1. Cost elimination: Run Claude Code against free, local models instead of paying per-token API costs
  2. Privacy: Keep sensitive code entirely on your infrastructure
  3. Model flexibility: Test Claude Code with different open-source models to find the best fit for your workflow

Try It Now — Quick Setup with Ollama

Here's how to get Claude Code running with a local Ollama instance in under 5 minutes:

# 1. Start Ollama and pull a model
ollama pull qwen3:14b

# 2. Configure VibePod
# Create or edit ~/.config/vibepod/config.yaml
cat > ~/.config/vibepod/config.yaml << EOF
llm:
  enabled: true
  base_url: "http://host.docker.internal:11434"
  api_key: "ollama"
  model: "qwen3:14b"
EOF

# 3. Run Claude Code with the local model
vp run claude

Important: Use host.docker.internal (not localhost) so VibePod's Docker container can reach Ollama on your host machine.

Environment Variable Configuration

You can also configure this at runtime without editing config files:

# Claude Code with a remote Ollama server
VP_LLM_ENABLED=true \
VP_LLM_MODEL=qwen3.5:9b \
VP_LLM_BASE_URL=https://ollama.example.com \
  vp run claude

# Local Ollama with API key
VP_LLM_ENABLED=true \
VP_LLM_BASE_URL=http://host.docker.internal:11434 \
VP_LLM_API_KEY=ollama \
VP_LLM_MODEL=qwen3:14b \
  vp run claude

Using vLLM or Other OpenAI-Compatible Servers

The same approach works with vLLM or any server that speaks the OpenAI or Anthropic API:

# ~/.config/vibepod/config.yaml
llm:
  enabled: true
  base_url: "http://my-vllm-server:8000/v1"
  api_key: "my-api-key"
  model: "meta-llama/Llama-3-8B-Instruct"

Note the endpoint difference: Claude Code uses the Anthropic-compatible endpoint (no /v1 suffix), while Codex uses the OpenAI-compatible endpoint (with /v1 suffix). Adjust your base_url accordingly.

Per-Agent Configuration

If you need different LLM settings for specific agents, use per-agent environment overrides:

llm:
  enabled: true
  base_url: "http://host.docker.internal:11434"
  api_key: "ollama"
  model: "qwen3:14b"

agents:
  claude:
    env:
      ANTHROPIC_BASE_URL: "http://different-server:11434"

Disabling LLM Injection

To temporarily turn off LLM routing without removing your config:

# Via environment variable
VP_LLM_ENABLED=false vp run claude

# Or in config
llm:
  enabled: false
  # ... rest of config remains

What Works and What Doesn't

Currently, only Claude Code and Codex agents support LLM mapping. Other VibePod agents will ignore the LLM configuration and use their default backends.

Performance Considerations

Running Claude Code on local models will be slower than using Anthropic's optimized infrastructure. For best results:

  • Use quantized models (like qwen3:14b-q4_K_M) for faster inference
  • Ensure you have sufficient RAM/VRAM for your chosen model
  • Start with smaller models (7B-14B range) before trying larger ones

This integration opens up new possibilities for Claude Code users who want to experiment with different models or maintain complete control over their AI-assisted development workflow.

AI Analysis

Claude Code users should immediately test this integration with a local Ollama instance. The setup is straightforward and gives you a free alternative to Anthropic's API. Start by pulling a quantized model like `qwen3:14b-q4_K_M` or `llama3.1:8b` and configure VibePod to use it. For daily use, consider creating separate VibePod configurations for different scenarios: one for local models when working on private code or experimenting, and another for production use with Anthropic's API when you need maximum performance. The per-agent configuration feature lets you mix and match—you could run Claude Code on a local model while keeping other agents on their default backends. Remember that open-source models won't match Claude's coding capabilities exactly. You'll need to adjust your expectations and potentially your prompting style. However, for many routine coding tasks, models like Qwen or Llama can provide substantial value at zero cost.
Original sourcevibepod.dev

Trending Now

More in Products & Launches

Browse more AI articles