How to Run Claude Code on Local LLMs with VibePod's New Backend Support
VibePod, the development environment tool, has added a crucial feature for Claude Code users: the ability to connect Claude Code agents to external LLM servers. This means you can now run Claude Code against open-source models served by Ollama, vLLM, or any compatible endpoint.
What Changed — Direct LLM Routing for Claude Code
Previously, Claude Code only worked with Anthropic's official API. Now, VibePod can intercept and redirect Claude Code's API calls to your own LLM servers. The system exposes environment variables that make Claude Code think it's talking to Anthropic's servers while actually connecting to your local or remote model endpoints.
What It Means For You — Local Models, Cost Control, and Privacy
This update gives you three immediate advantages:
- Cost elimination: Run Claude Code against free, local models instead of paying per-token API costs
- Privacy: Keep sensitive code entirely on your infrastructure
- Model flexibility: Test Claude Code with different open-source models to find the best fit for your workflow
Try It Now — Quick Setup with Ollama
Here's how to get Claude Code running with a local Ollama instance in under 5 minutes:
# 1. Start Ollama and pull a model
ollama pull qwen3:14b
# 2. Configure VibePod
# Create or edit ~/.config/vibepod/config.yaml
cat > ~/.config/vibepod/config.yaml << EOF
llm:
enabled: true
base_url: "http://host.docker.internal:11434"
api_key: "ollama"
model: "qwen3:14b"
EOF
# 3. Run Claude Code with the local model
vp run claude
Important: Use host.docker.internal (not localhost) so VibePod's Docker container can reach Ollama on your host machine.
Environment Variable Configuration
You can also configure this at runtime without editing config files:
# Claude Code with a remote Ollama server
VP_LLM_ENABLED=true \
VP_LLM_MODEL=qwen3.5:9b \
VP_LLM_BASE_URL=https://ollama.example.com \
vp run claude
# Local Ollama with API key
VP_LLM_ENABLED=true \
VP_LLM_BASE_URL=http://host.docker.internal:11434 \
VP_LLM_API_KEY=ollama \
VP_LLM_MODEL=qwen3:14b \
vp run claude
Using vLLM or Other OpenAI-Compatible Servers
The same approach works with vLLM or any server that speaks the OpenAI or Anthropic API:
# ~/.config/vibepod/config.yaml
llm:
enabled: true
base_url: "http://my-vllm-server:8000/v1"
api_key: "my-api-key"
model: "meta-llama/Llama-3-8B-Instruct"
Note the endpoint difference: Claude Code uses the Anthropic-compatible endpoint (no /v1 suffix), while Codex uses the OpenAI-compatible endpoint (with /v1 suffix). Adjust your base_url accordingly.
Per-Agent Configuration
If you need different LLM settings for specific agents, use per-agent environment overrides:
llm:
enabled: true
base_url: "http://host.docker.internal:11434"
api_key: "ollama"
model: "qwen3:14b"
agents:
claude:
env:
ANTHROPIC_BASE_URL: "http://different-server:11434"
Disabling LLM Injection
To temporarily turn off LLM routing without removing your config:
# Via environment variable
VP_LLM_ENABLED=false vp run claude
# Or in config
llm:
enabled: false
# ... rest of config remains
What Works and What Doesn't
Currently, only Claude Code and Codex agents support LLM mapping. Other VibePod agents will ignore the LLM configuration and use their default backends.
Performance Considerations
Running Claude Code on local models will be slower than using Anthropic's optimized infrastructure. For best results:
- Use quantized models (like
qwen3:14b-q4_K_M) for faster inference - Ensure you have sufficient RAM/VRAM for your chosen model
- Start with smaller models (7B-14B range) before trying larger ones
This integration opens up new possibilities for Claude Code users who want to experiment with different models or maintain complete control over their AI-assisted development workflow.





