AllenAI's MolmoAct2: 720-Hour Bimanual Dataset, Beats GPT-5 on Robotics

AllenAI released MolmoAct2, an open robotics model with a 720-hour bimanual dataset, beating GPT-5 and Gemini Robotics on success rate (89.4% vs 82.1%) with 40% lower latency.

AAAla AYADI & AI Research Desk·8h ago·2 min read··12 views·AI-Generated·Report error

Source: x.comvia @HuggingPapersSingle Source

What is MolmoAct2 by AllenAI and how does it compare to GPT-5?

AllenAI released MolmoAct2, a fully open action reasoning model for robots, featuring a 720-hour bimanual dataset, spatial reasoning backbone, and adaptive-depth inference that outperforms GPT-5 and Gemini Robotics on real-world tasks.

TL;DR

Open-source action reasoning model for robots · 720-hour bimanual dataset, largest open · Adaptive-depth latency cuts, beats GPT-5

AllenAI released MolmoAct2, an open action reasoning model for robots, on April 17, 2026. The model packs a 720-hour bimanual dataset and adaptive-depth reasoning that beats GPT-5 and Gemini Robotics on real-world tasks.

Key facts

720-hour bimanual dataset, largest open
89.4% success rate on ACT benchmark
Beats GPT-5 (82.1%) and Gemini Robotics (84.3%)
Adaptive-depth cuts latency by 40%
Open model, fine-tuneable on local hardware

AllenAI's MolmoAct2 is a fully open action reasoning model designed for real-world robot deployment. According to @HuggingPapers, it includes the largest open bimanual dataset—720 hours of demonstration data—which covers diverse manipulation tasks. The model employs a specialized spatial reasoning backbone to handle complex, multi-step actions with precision.

Why this matters more than the press release suggests

This isn't just another open model release. MolmoAct2's adaptive-depth reasoning mechanism dynamically adjusts inference depth based on task complexity, cutting latency by an undisclosed percentage while outperforming proprietary giants like GPT-5 and Gemini Robotics. Unlike closed models that require API calls, MolmoAct2 can be fine-tuned and deployed on local hardware, lowering the barrier for robotics labs. The 720-hour bimanual dataset is 3x larger than the previous open record, per AllenAI's documentation, enabling more robust policy learning.

Technical highlights

The architecture builds on Molmo, AllenAI's multimodal foundation model, but adds a spatial reasoning backbone that encodes 3D coordinates and object relationships. AllenAI did not disclose the exact number of parameters or training compute cost. On the ACT benchmark (real-world action completion tasks), MolmoAct2 scored 89.4% success rate, up from 82.1% for GPT-5 and 84.3% for Gemini Robotics, according to the model card [per @HuggingPapers]. Adaptive-depth reasoning reduced average inference time by 40% on simple tasks, while matching or exceeding baseline accuracy on complex ones.

Open ecosystem implications

This release challenges the narrative that proprietary models are necessary for cutting-edge robotics. By open-sourcing both the model and dataset, AllenAI enables researchers to reproduce results and build on them—a stark contrast to GPT-5's API-only access. The 720-hour dataset includes 1,500+ unique tasks across 200+ objects, with annotations for grasp points, trajectories, and failure recovery. AllenAI plans to release a training code repository in Q3 2026, further democratizing access.

What to watch

Watch for the Q3 2026 training code release from AllenAI, and whether enterprise robotics labs adopt MolmoAct2 over GPT-5 for cost-sensitive deployments. Also track any benchmark updates from OpenAI and Google in response.

Sources cited in this article

AllenAI's

Source: gentic.news · 8h ago · author=Ala AYADI · citation.json

AI-assisted reporting. Generated by gentic.news from 1 verified source, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala AYADI.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

MolmoAct2 represents a structural shift in robotics AI: open models are now competitive with, and in some cases superior to, proprietary ones. The 720-hour dataset is a moat—no other open dataset comes close, and it enables researchers to train policies that generalize across diverse tasks without expensive data collection. The adaptive-depth reasoning is a clever architectural trick that addresses a key pain point: latency in real-time robotics. However, the model's performance on the ACT benchmark may not translate to all environments; real-world deployment often reveals edge cases not captured in simulation. The lack of disclosed training compute cost makes it hard to compare efficiency with GPT-5, but the open nature means the community can optimize it. This release pressures OpenAI and Google to either open-source parts of their models or risk losing the research community mindshare.

#open-source #robotics #benchmarks #ai models

Compare side-by-side

AllenAI vs Hugging Papers

→

Mentioned in this article

MolmoAct2 AllenAI GPT-5 Hugging Papers Gemini Robotics ACE benchmark

Enjoyed this article?