reasoning models

30 articles about reasoning models in AI news

Video Reasoning Models Use Chain-of-Steps in Diffusion Denoising, Not Cross-Frame Analysis

New research reveals video reasoning models don't analyze frames sequentially but instead use a Chain-of-Steps mechanism within diffusion denoising, developing emergent working memory and self-correction.

Mar 18, 202685% relevant

Research: Cheaper Reasoning Models Can Cost 3x More Due to Higher Error Rates and Retry Loops

New research indicates that selecting AI models based solely on per-token pricing can be a false economy. Models with lower accuracy often require multiple expensive retries, ultimately increasing total costs by up to 300%.

Mar 29, 202687% relevant

Microsoft's MEMENTO Method Reduces LLM Reasoning Memory by 3x

Microsoft researchers introduced MEMENTO, a method where LLMs generate structured 'notes' during multi-step reasoning, reducing the memory footprint of the reasoning process by 3x while maintaining performance. This addresses a key bottleneck in deploying complex reasoning models.

Apr 16, 202680% relevant

Reasoning Training Fails to Improve Embedding Quality: Study Finds No Transfer to General Language Understanding

Research shows that training AI models for step-by-step reasoning does not improve their ability to create semantic embeddings for search or general QA. Advanced reasoning models perform identically to base models on standard retrieval benchmarks.

Mar 21, 202685% relevant

AI Reasoning Costs Plummet: 1000x Price Drop Signals Dawn of Accessible Intelligence

The cost of running advanced AI reasoning models has collapsed by 1000x in just 16 months, revealing unprecedented efficiency gains beyond raw model improvements. This dramatic reduction suggests we're still in early stages of AI development with massive optimization potential remaining.

Mar 12, 202685% relevant

NVIDIA's Kimi-K2.5 Eagle Head: Supercharging Moonshot's Reasoning with Speculative Decoding

NVIDIA has released the Kimi-K2.5 Eagle head on Hugging Face, implementing Eagle-3 speculative decoding to dramatically accelerate inference for Moonshot's reasoning models. This breakthrough promises blazing-fast performance while maintaining accuracy.

Mar 12, 202689% relevant

Mercury 2: The End of Autoregressive Thinking in AI Reasoning

Mercury 2 represents a paradigm shift in AI reasoning architecture, moving beyond traditional autoregressive generation to create native reasoning models that process information simultaneously rather than sequentially.

Feb 24, 202685% relevant

Sarvam AI's Open-Source Models Signal India's Arrival in Global AI Race

Sarvam AI has open-sourced two reasoning models—Sarvam 30B and 105B—positioning India as a competitive player in global AI. The breakthrough lies not just in benchmark scores but in a full-stack approach: in-house data, training, RL, tokenizer design, and optimized inference for both frontier GPUs and consumer devices.

Mar 6, 202685% relevant

OpenAI Targets First 'AI Intern' by September 2028, Building Toward Autonomous Researchers

OpenAI plans to deploy its first 'AI intern' by September and aims for a full autonomous research system by 2028. The effort builds on reasoning models and agent systems like Codex, which have shown dramatic productivity gains but still face reliability and safety challenges.

Mar 20, 202695% relevant

ViGoR-Bench Exposes 'Logical Desert' in SOTA Visual AI: 20+ Models Fail Physical, Causal Reasoning Tasks

Researchers introduce ViGoR-Bench, a unified benchmark testing visual generative models on physical, causal, and spatial reasoning. It reveals significant deficits in over 20 leading models, challenging the 'performance mirage' of current evaluations.

Mar 30, 202694% relevant

The Reasoning Transparency Gap: AI Models Can't Control Their Own Thought Processes

New research reveals AI models can control their final answers 62% of the time but only control their reasoning chains 3% of the time, exposing fundamental limitations in how these systems monitor their own thought processes.

Mar 14, 202685% relevant

AI's Hidden Capabilities: How Simple Prompts Unlock Advanced Reasoning in Language Models

New research reveals that large language models possess latent reasoning abilities that can be activated through specific prompting techniques, fundamentally changing how we understand AI capabilities and their potential applications.

Mar 9, 202685% relevant

OpenAI's New Safety Metric Reveals AI Models Struggle to Control Their Own Reasoning

OpenAI has introduced 'CoT controllability' as a new safety metric, revealing that AI models like GPT-5.4 Thinking struggle to deliberately manipulate their own reasoning processes. The company views this limitation as encouraging for AI safety, suggesting models lack dangerous self-modification capabilities.

Mar 6, 202675% relevant

AI Research Breakthroughs: From Video Reasoning to Self-Stopping Models

This week's top AI papers reveal major advances in video understanding, reasoning efficiency, and agent training. Researchers introduced a massive video reasoning dataset, models that know when to stop thinking, and techniques for improving AI agents without full retraining.

Mar 1, 202695% relevant

BioBridge AI Merges Protein Science with Language Models for Breakthrough Biological Reasoning

Researchers introduce BioBridge, a novel AI framework that combines protein language models with general-purpose LLMs to enable enhanced biological reasoning. The system achieves state-of-the-art performance on protein benchmarks while maintaining general language understanding capabilities.

Feb 23, 202675% relevant

AI's Causal Reasoning Gap: New Method Tests How Well Models Understand 'What If' Scenarios

Researchers introduce Double Counterfactual Consistency (DCC), a training-free method to evaluate and improve LLMs' causal reasoning. The technique reveals significant weaknesses in how models handle hypothetical scenarios and counterfactual thinking, addressing a critical limitation in current AI systems.

Feb 20, 202675% relevant

Logitext Bridges the Gap Between Language Models and Logical Reasoning

Researchers introduce Logitext, a neurosymbolic framework that treats LLM reasoning as an SMT theory, enabling joint textual-logical analysis of partially structured documents. The system improves accuracy on content moderation and legal reasoning tasks.

Feb 23, 202670% relevant

ThermoQA Benchmark Reveals LLM Reasoning Gaps: Claude Opus Leads at 94.1%

Researchers released ThermoQA, a 293-question benchmark testing thermodynamic reasoning. Claude Opus 4.6 scored 94.1% overall, but models showed significant degradation on complex cycle analysis versus simple property lookups.

Apr 23, 202678% relevant

NVIDIA's Audio Flamingo Next: 30-Min Audio, Time-Grounded Reasoning

NVIDIA has launched Audio Flamingo Next, a next-generation open audio-language model supporting 30-minute audio inputs and time-grounded reasoning. Trained on over 1 million hours of data, it reportedly outperforms larger models on key audio understanding benchmarks.

Apr 19, 202695% relevant

Massive Video Reasoning Dataset Released, Reportedly 1000x Larger Than Predecessors

An unverified report claims the release of a video reasoning dataset roughly 1000x larger than existing benchmarks. If true, it would be a significant resource for training next-generation video understanding models.

Apr 8, 202699% relevant

Study Finds LLM 'Brain Activity' Collapses Under Hard Questions, Revealing Internal Reasoning Limits

New research shows language models' internal activation patterns shrink and simplify when faced with difficult reasoning tasks, suggesting they may rely on shortcuts rather than deep reasoning. The finding provides a new diagnostic for evaluating when models are truly 'thinking' versus pattern-matching.

Mar 31, 202685% relevant

Luma Labs Launches Uni-1: An Autoregressive Transformer for Image Generation with a Pre-Generation Reasoning Phase

Luma Labs has released Uni-1, a foundational image model that uses an autoregressive transformer to reason about user intent before generating pixels. It aims to address the 'intent gap' common in diffusion models by adding a structured reasoning step.

Mar 24, 202688% relevant

Fine-Tuning Gemma 3 1B-IT for Financial Reasoning with QLoRA

A technical guide details using QLoRA and reasoning-augmented data to fine-tune Google's Gemma 3 1B-IT model for financial analysis. This demonstrates a method to specialize small language models for complex, domain-specific tasks.

Mar 18, 202689% relevant

ReasonGR: A Framework for Multi-Step Semantic Reasoning in Generative Retrieval

Researchers propose ReasonGR, a framework to enhance generative retrieval models' ability to handle complex, numerical queries requiring multi-step reasoning. Tested on financial QA, it improves accuracy for tasks like analyzing reports.

Mar 16, 202680% relevant

Anthropic Surpasses Google in Extended Context AI, Redefining Long-Form Reasoning

Anthropic's Claude has reportedly outperformed Google's models in maintaining attention and reasoning across extended contexts, marking a significant shift in the AI landscape where context length has become a critical competitive frontier.

Mar 14, 202687% relevant

Sam Altman Envisions AI That Thinks for Days: The Dawn of Super-Long-Term Reasoning

OpenAI CEO Sam Altman predicts future AI models will perform "super long-term reasoning," spending days or weeks analyzing complex, high-stakes problems. This represents a fundamental shift from today's rapid-response systems toward deliberate, extended cognitive processes.

Mar 13, 202685% relevant

Teaching AI to Forget: How Reasoning-Based Unlearning Could Revolutionize LLM Safety

Researchers propose a novel 'targeted reasoning unlearning' method that enables large language models to selectively forget specific knowledge while preserving general capabilities. This approach addresses critical safety, copyright, and privacy concerns in AI systems through explainable reasoning processes.

Mar 12, 202693% relevant

AI's Hidden Reasoning Flaw: New Framework Tackles Multimodal Hallucinations at Their Source

Researchers introduce PaLMR, a novel framework that addresses a critical weakness in multimodal AI: 'process hallucinations,' where models give correct answers but for the wrong visual reasons. By aligning both outcomes and reasoning processes, PaLMR significantly improves visual reasoning fidelity.

Mar 10, 202675% relevant

Meta's Breakthrough: Structured Reasoning Cuts AI Code Errors by Half

Meta researchers discovered that forcing AI models to show step-by-step reasoning with proof reduces code patch error rates by nearly 50%. This simple structured prompting technique achieves 93% accuracy without expensive retraining.

Mar 7, 202695% relevant

DeepVision-103K: The Math Dataset That Could Revolutionize AI's Visual Reasoning

Researchers have introduced DeepVision-103K, a comprehensive mathematical dataset with 103,000 verifiable visual instances designed to train multimodal AI models. Covering K-12 topics from geometry to statistics, this dataset addresses critical gaps in AI's visual reasoning capabilities.

Mar 1, 202685% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety