Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

A laptop screen displays JetBrains IDE with Mellum2 AI code generation, a developer typing at keyboard

JetBrains Open-Sources Mellum2: 12B MoE at 2.5B Active Params

JetBrains open-sourced Mellum2, a 12B MoE model with 2.5B active params, trained from scratch for code and reasoning.

·1d ago·2 min read··22 views·AI-Generated·Report error
Share:
What is JetBrains' Mellum2 model?

JetBrains open-sourced Mellum2, a 12B-parameter Mixture-of-Experts model with 2.5B active parameters per token, trained from scratch for code and reasoning tasks.

TL;DR

JetBrains released Mellum2, a 12B MoE model. · Uses only 2.5B active parameters per token. · Trained from scratch; competitive with 14B dense models.

JetBrains open-sourced Mellum2, a 12B-parameter Mixture-of-Experts model with 2.5B active parameters per token. The model targets code generation and agentic reasoning while running at roughly one-fifth the compute of a dense 12B model.

Key facts

  • 12B total parameters, 2.5B active per token.
  • Trained from scratch, not fine-tuned.
  • Designed for code, reasoning, and agentic workflows.
  • Competitive with dense 14B models at 5x lower compute.
  • Open-sourced by JetBrains (IDE maker).

JetBrains has released Mellum2, a 12B-parameter Mixture-of-Experts (MoE) model that activates only 2.5B parameters per token, according to @HuggingPapers. The model was trained from scratch—not fine-tuned from an existing checkpoint—and is designed for code, reasoning, and fast agentic workflows.

Architecture and Training

Mellum2 uses a sparse MoE architecture where each token routes to a subset of experts, keeping inference compute low. With 2.5B active parameters, it is competitive with dense 14B models, JetBrains claims. The training data mix and compute budget were not disclosed, but the model was trained from scratch, suggesting a significant upfront investment.

Comparison to Prior Art

The 2.5B-active-parameter design places Mellum2 in the same efficiency tier as models like Mixtral 8x7B (12.9B total, ~13B active) but with a sparser gating mechanism. Compared to CodeLlama 13B (dense, 13B active), Mellum2 offers roughly 5x compute savings per token while targeting similar code-generation quality. JetBrains did not publish benchmark scores against HumanEval or SWE-Bench, so third-party validation is pending.

Strategic Implications

JetBrains, primarily known for IDEs like IntelliJ and PyCharm, is now competing in the open-weight model space. By releasing Mellum2 under an open-source license, the company positions itself as an infrastructure provider for on-device and agentic coding assistants, potentially tying model performance to its own tooling ecosystem. This mirrors Microsoft's strategy with Phi-3 but from a tools-first angle.

Limitations and Open Questions

JetBrains has not disclosed training data provenance, tokenizer details, or context window length. The claim of being "competitive with 14B models" lacks specific benchmarks—no HumanEval pass@1, MBPP, or MMLU scores were released. Independent evaluation is needed to verify the efficiency claims.

What to watch

Watch for independent benchmark evaluations on HumanEval and SWE-Bench, and whether JetBrains integrates Mellum2 into its IDEs as a local code assistant—potentially challenging GitHub Copilot's market share.

Source: gentic.news · · author= · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

Mellum2's release is a strategic move by JetBrains to extend its tooling ecosystem into the model layer. By open-sourcing a sparse MoE model optimized for code, JetBrains can offer local, low-latency code completion without relying on third-party APIs—potentially undercutting GitHub Copilot's cloud-dependent model. The 2.5B active parameter count is notable: it suggests JetBrains prioritized inference speed and on-device deployment over raw benchmark chasing. However, the lack of published benchmarks is a red flag. JetBrains claims competitiveness with 14B dense models, but without HumanEval or SWE-Bench scores, the claim is unverifiable. This mirrors the pattern of many open-weight releases where efficiency claims outpace reproducible evidence. The sparse MoE architecture itself is not novel—Mixtral and DeepSeek MoE have similar designs—but JetBrains' training-from-scratch approach differentiates it from fine-tuned derivatives. The real test will be whether Mellum2 can match CodeLlama 13B or DeepSeek-Coder 6.7B on standard coding benchmarks at a fraction of the compute.

Mentioned in this article

Enjoyed this article?
Share:

AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Related Articles

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

More in Products & Launches

View all