Skip to content
gentic.news — AI News Intelligence Platform

mixture of experts

30 articles about mixture of experts in AI news

The Hidden Cost of Mixture-of-Experts: New Research Reveals Why MoE Models Struggle at Inference

A groundbreaking paper introduces the 'qs inequality,' revealing how Mixture-of-Experts architectures suffer a 'double penalty' during inference that can make them 4.5x slower than dense models. The research shows training efficiency doesn't translate to inference performance, especially with long contexts.

75% relevant

NVIDIA Nemotron 3 Super: 120B Hybrid Mamba-Transformer MoE with 1M Context

NVIDIA has released Nemotron 3 Super, a 120B parameter open hybrid Mamba-Transformer Mixture of Experts model with 12B active parameters and 1M token context length. The company claims it delivers up to 7.5x higher throughput than similar open models.

95% relevant

Alibaba's Qwen3.5: The Efficiency Breakthrough That Could Democratize Multimodal AI

Alibaba has open-sourced Qwen3.5, a multimodal AI model that combines linear attention with sparse Mixture of Experts architecture to deliver high performance without exorbitant computational costs, potentially making advanced AI more accessible.

85% relevant

Nvidia Claims MLPerf Inference v6.0 Records with 288-GPU Blackwell Ultra Systems, Highlights 2.7x Software Gains

MLCommons released MLPerf Inference v6.0 results, introducing multimodal and video model tests. Nvidia set records using 288-GPU Blackwell Ultra systems and achieved a 2.7x performance jump on DeepSeek-R1 via software optimizations alone.

95% relevant

Google's TurboQuant AI Research Report Sparks Sell-Off in Micron, Samsung, and SK Hynix Memory Stocks

Google's TurboQuant research blog publication triggered immediate market reaction, with shares of major memory manufacturers dropping 2-4% as investors anticipate AI-driven efficiency gains reducing future memory demand.

85% relevant

Google's Gemma 4 Emerges: The Next Generation of Open AI Models

Google has announced the upcoming release of Gemma 4, the next iteration of its open-source AI model family. This development signals Google's continued commitment to accessible AI technology and intensified competition in the open model space.

85% relevant

Brain-OF: The First Unified AI Model That Reads Multiple Brain Signals Simultaneously

Researchers have developed Brain-OF, the first omnifunctional foundation model that jointly processes fMRI, EEG, and MEG brain signals. This unified approach overcomes previous single-modality limitations by integrating complementary spatiotemporal data through innovative architecture and pretraining techniques.

80% relevant

Alibaba Qwen3.6-35B-A3B: 3B-Active Sparse MoE Hits 73.4% on SWE-Bench

Alibaba released Qwen3.6-35B-A3B, a sparse mixture-of-experts model with 35B total but only 3B active parameters. It shows significant gains over its predecessor, scoring 73.4% on SWE-bench Verified and beating Claude 3.5 Sonnet on several vision tasks.

97% relevant

MiniMax M2.7 Tops Open LLM Leaderboard with 230B Parameter Sparse Model

MiniMax announced its M2.7 model has taken the top spot on the Hugging Face Open LLM Leaderboard. The model uses a sparse mixture-of-experts architecture with 230B total parameters but only activates 10B per token.

85% relevant

Cursor AI Claims 1.84x Faster MoE Inference on NVIDIA Blackwell GPUs

Cursor AI announced a rebuilt inference engine for Mixture-of-Experts models on NVIDIA's new Blackwell GPUs, resulting in a claimed 1.84x speedup and improved output accuracy.

85% relevant

Google Releases Gemma 4 Family Under Apache 2.0, Featuring 2B to 31B Models with MoE and Multimodal Capabilities

Google has released the Gemma 4 family of open-weight models, derived from Gemini 3 technology. The four models, ranging from 2B to 31B parameters and including a Mixture-of-Experts variant, are available under a permissive Apache 2.0 license and feature multimodal processing.

100% relevant

Kimi 2.5's 1T Parameter MoE Model Runs on 96GB Mac Hardware via SSD Streaming

Developers have demonstrated that Kimi 2.5's 1 trillion parameter Mixture-of-Experts model can run on Mac hardware with just 96GB RAM by streaming expert weights from SSD, with only 32B parameters active per token.

85% relevant

Step-3.5-Flash: 196B Open-Source MoE Model Activates Only 11B Parameters, Outperforms Kimi K2.5 and Claude Opus 4.5 on Key Benchmarks

Shanghai-based StepFun's Step-3.5-Flash, a 196B parameter sparse mixture-of-experts model that activates only 11B parameters per token, achieves top scores on AIME 2025 (97.3) and LiveCodeBench-V6 (86.4) while costing 18.9x less to run than Kimi K2.5.

95% relevant

NVIDIA Releases Nemotron-Cascade 2: A 30B MoE Model with 3B Active Parameters

NVIDIA has open-sourced Nemotron-Cascade 2, a 30B parameter Mixture-of-Experts model that activates only 3B parameters per token. It claims 'gold medal performance' on IMO and IOI 2025 benchmarks.

95% relevant

Beyond Homogenization: How Expert Divergence Learning Unlocks MoE's True Potential

Researchers have developed Expert Divergence Learning, a novel pre-training strategy that combats expert homogenization in Mixture-of-Experts language models. By encouraging functional specialization through domain-aware routing, the method improves performance across benchmarks with minimal computational overhead.

75% relevant

Video of Massive AI Training Lab in China Sparks Debate on Automation's Scale

A social media post showcasing a vast Chinese AI training lab has reignited discussions about job displacement, underscoring the tangible infrastructure powering the current AI surge.

85% relevant

Ethan Mollick Declares End of 'RAG Era' as Dominant Paradigm for AI Agents

AI researcher Ethan Mollick declared that the 'RAG era' for supplying context to AI agents has ended, marking a significant architectural shift in how advanced AI systems process information.

75% relevant

AI Bridges the Gap Between Data and Discovery: New Framework Aligns Scientific Observations with Decades of Literature

Researchers have developed a novel AI framework that aligns X-ray spectra with scientific literature using contrastive learning. This multimodal approach improves physical variable estimation by 16-18% and identifies high-priority astronomical targets, demonstrating how AI can accelerate scientific discovery by connecting data with domain knowledge.

75% relevant

AI Chip Capacity Crisis: 10GW Left Through 2030, Prices Up Double Digits

The AI accelerator market has only 10 gigawatts of capacity left for contract through 2030, with 100GW already under contract. Prices are rising double digits as one competitor has stopped taking orders entirely.

97% relevant

LoopCTR: A New 'Loop Scaling' Paradigm for Efficient

A new research paper introduces LoopCTR, a method for scaling Transformer-based CTR models by recursively reusing shared layers during training. This 'train-multi-loop, infer-zero-loop' approach achieves state-of-the-art performance with lower deployment costs, directly addressing a core industrial constraint in recommendation systems.

92% relevant

Anthropic CEO Dario Amodei: China Will Match Mythos AI Within a Year

Anthropic CEO Dario Amodei stated China will replicate the capabilities of Anthropic's advanced 'Mythos' AI project within 12 months. He also sees no near-term slowdown in AI progress.

85% relevant

Anthropic's Opus 4.7 Shows Sustained Gains on Economically Critical Tasks

Ethan Mollick highlights that Anthropic's latest Claude Opus 4.7 model shows measurable performance gains on economically important tasks, continuing a rapid two-month release cycle with no signs of plateau.

99% relevant

Ethan Mollick: AI Bottleneck Theory Explains Sudden Capability Jumps

Wharton professor Ethan Mollick posits that incremental AI improvements can cause sudden, large jumps in practical ability when they remove a critical bottleneck in a workflow. This explains why progress often appears non-linear.

85% relevant

India's Human Motion Farms Train Humanoid Robots with First-Person Hand Data

Labs in India are capturing detailed human motion data—focusing on grip, force, and error recovery—to train AI models for humanoid robots. This addresses the critical bottleneck of acquiring physical intelligence data for robotics.

89% relevant

OpenAI Forecasts $121B in AI Hardware Costs for 2028

OpenAI is forecasting its own AI research hardware costs will reach $121 billion in 2028, according to a WSJ report. This figure highlights the extreme capital intensity required to compete at the frontier of AI.

85% relevant

xAI's Grok 4.2 at 0.5T Params, Colossus 2 Training Models up to 10T

A tweet from AI researcher Rohan Paul states xAI's current Grok 4.2 model uses 0.5 trillion parameters. In parallel, the Colossus 2 project is training a suite of seven models ranging from 1 trillion to 10 trillion parameters.

85% relevant

Meta's New Training Recipe: Small Models Should Learn from a Single Expert

Meta AI researchers propose a novel training recipe for small language models: instead of learning from many large 'expert' models simultaneously, they should be trained sequentially on one expert at a time. This method, detailed in a new paper, reportedly improves final model performance and training efficiency.

85% relevant

Mistral AI Teases 'New Model Tomorrow' in Cryptic Tweet

Mistral AI co-founder Arthur Mensch tweeted 'new model tomorrow!?!', signaling an imminent release. This follows their pattern of rapid, often surprise, model deployments.

85% relevant

Claude Mythos Priced 5x Higher Than Claude Opus 4.6

Anthropic's newly detailed Claude Mythos model is priced at 5x the cost of Claude Opus 4.6. This premium pricing strategy suggests a focus on high-value enterprise use cases over raw performance-per-dollar.

81% relevant

Mythos AI Model Reportedly 'Destroys' Benchmarks in Early Leak

A viral tweet claims the unreleased Mythos AI model 'destroys every other model' based on leaked benchmarks. No official confirmation or technical details are available.

85% relevant