JetBrains open-sourced Mellum2, a 12B-parameter Mixture-of-Experts model with 2.5B active parameters per token. The model targets code generation and agentic reasoning while running at roughly one-fifth the compute of a dense 12B model.
Key facts
- 12B total parameters, 2.5B active per token.
- Trained from scratch, not fine-tuned.
- Designed for code, reasoning, and agentic workflows.
- Competitive with dense 14B models at 5x lower compute.
- Open-sourced by JetBrains (IDE maker).
JetBrains has released Mellum2, a 12B-parameter Mixture-of-Experts (MoE) model that activates only 2.5B parameters per token, according to @HuggingPapers. The model was trained from scratch—not fine-tuned from an existing checkpoint—and is designed for code, reasoning, and fast agentic workflows.
Architecture and Training
Mellum2 uses a sparse MoE architecture where each token routes to a subset of experts, keeping inference compute low. With 2.5B active parameters, it is competitive with dense 14B models, JetBrains claims. The training data mix and compute budget were not disclosed, but the model was trained from scratch, suggesting a significant upfront investment.
Comparison to Prior Art
The 2.5B-active-parameter design places Mellum2 in the same efficiency tier as models like Mixtral 8x7B (12.9B total, ~13B active) but with a sparser gating mechanism. Compared to CodeLlama 13B (dense, 13B active), Mellum2 offers roughly 5x compute savings per token while targeting similar code-generation quality. JetBrains did not publish benchmark scores against HumanEval or SWE-Bench, so third-party validation is pending.
Strategic Implications
JetBrains, primarily known for IDEs like IntelliJ and PyCharm, is now competing in the open-weight model space. By releasing Mellum2 under an open-source license, the company positions itself as an infrastructure provider for on-device and agentic coding assistants, potentially tying model performance to its own tooling ecosystem. This mirrors Microsoft's strategy with Phi-3 but from a tools-first angle.
Limitations and Open Questions
JetBrains has not disclosed training data provenance, tokenizer details, or context window length. The claim of being "competitive with 14B models" lacks specific benchmarks—no HumanEval pass@1, MBPP, or MMLU scores were released. Independent evaluation is needed to verify the efficiency claims.
What to watch
Watch for independent benchmark evaluations on HumanEval and SWE-Bench, and whether JetBrains integrates Mellum2 into its IDEs as a local code assistant—potentially challenging GitHub Copilot's market share.






