Ollama now supports running OpenAI's Codex locally with open-source models including DeepSeek V4, Gemma 4, and Qwen 3.6. The move eliminates API costs and rate limits, targeting developers frustrated by pricing and latency.
Key facts
- Ollama supports DeepSeek V4, Gemma 4, Qwen 3.6 for Codex.
- No API costs or rate limits for local execution.
- Hardware likely requires 16GB+ VRAM GPU.
- Exact model performance vs. OpenAI Codex unmeasured.
- Market pressure on OpenAI/Microsoft for differentiation.
Ollama now supports running OpenAI's Codex locally with open-source models including DeepSeek V4, Gemma 4, and Qwen 3.6. No API costs, no rate limits, and 100% local execution are the headline promises, according to a post by @intheworldofai on X. The exact performance delta between these open models and OpenAI's proprietary Codex remains unmeasured, but the availability signals a shift toward self-hosted code generation.
The move targets developers frustrated by API pricing and latency. Codex, which powers GitHub Copilot and other tools, has historically required cloud access. Ollama's integration bypasses that entirely, though users must manage their own hardware—likely requiring GPUs with at least 16GB VRAM for models like DeepSeek V4.
This is not the first local Codex-like tool; alternatives like Code Llama and StarCoder have existed. But Ollama's ecosystem—already popular for running LLaMA, Mistral, and other models—makes it the most accessible distribution channel. The unique take: this commoditizes code generation APIs, pressuring OpenAI and Microsoft to differentiate on fine-tuning or enterprise features rather than raw access.
Key limitations: the source does not disclose specific benchmarks, model sizes, or installation steps. Users must verify compatibility with their hardware and model versions. The tweet links to an external guide, but no independent testing has been published yet.
What this means for the market
Local code generation reduces latency to near-zero for inference and eliminates per-token costs. For individual developers and small teams, this could be transformative. For enterprises, data privacy concerns around sending code to external APIs vanish. However, model quality may not match OpenAI's latest Codex, which has been fine-tuned on GitHub data.
The competitive landscape
OpenAI's Codex faces growing competition from open-weight models. DeepSeek V4, Gemma 4, and Qwen 3.6 each target code generation, with varying strengths in different programming languages. Ollama's integration creates a unified interface, lowering the barrier to switching. [According to @intheworldofai], the setup is straightforward, but no detailed documentation has been released.
What to watch
Watch for independent benchmarks on SWE-Bench or HumanEval comparing these local models to OpenAI's hosted Codex, and whether Ollama releases official performance numbers. Also monitor GitHub Copilot pricing adjustments if local adoption accelerates.









