Screenshot of a web browser interface showing MolmoWeb AI agent navigating a YouTube page, with highlighted action…

AI2's MolmoWeb: Open 8B-Parameter Web Agent Navigates Using Screenshots, Challenges Proprietary Systems

The Allen Institute for AI released MolmoWeb, a fully open web agent that operates websites using only screenshots. The 8B-parameter model outperforms other open models and approaches proprietary performance, with all training data and weights publicly released.

AI Research

AI2's MolmoWeb: Open 8B-Parameter Web Agent Navigates Using Screenshots, Challenges Proprietary Systems

Anthropic's Economic Index: Claude 3.5 Sonnet Usage Grows 50% After 2 Months, Outpacing Claude 3 Opus

Meta's Hyperagents Enable Self-Referential AI Improvement, Achieving 0.710 Accuracy on Paper Review

MinerU-Diffusion: A 2.5B Parameter Diffusion Model for OCR Achieves 3.2x Speedup Over Autoregressive Methods

Google DeepMind's 'Learning Through Conversation' Paper Shows LLMs Can Improve with Real-Time Feedback

LLM Multi-Agent Framework 'Shared Workspace' Proposed to Improve Complex Reasoning via Task Decomposition

Ego2Web Benchmark Bridges Egocentric Video and Web Agents, Exposing Major Performance Gaps

VHS: Latent Verifier Cuts Diffusion Model Verification Cost by 63.3%, Boosts GenEval by 2.7%

Halsted VLM: A 650,000-Video Surgical Atlas and Platform for Temporal Procedure Mapping

CanViT: First Active-Vision Foundation Model Hits 45.9% mIoU on ADE20K with Sequential Glimpses

LeWorldModel: Yann LeCun's Team Achieves Stable World Model Training with 15M Parameters, No Training Tricks

Google's TurboQuant Cuts LLM KV Cache Memory by 6x, Enables 3-Bit Storage Without Accuracy Loss

OpenResearcher Paper Released: Method for Synthesizing Long-Horizon Research Trajectories for AI

Anthropic Economic Index: Claude Users Shift from Autonomy to Iteration, Attempt Higher-Value Tasks

Anthropic Deploys Multi-Agent Harness to Scale Claude's Frontend Design & Autonomous Software Engineering

Jensen Huang Predicts AI Training Shift to Synthetic Data, Compute as New Bottleneck

Meta's V-JEPA 2.1 Achieves +20% Robotic Grasp Success with Dense Feature Learning from 1M+ Hours of Video

Wharton Study Finds 'AI Writes, Humans Review' Model Failing in Real Business Contexts

LLMs Can Now De-Anonymize Users from Public Data Trails, Research Shows

Kimi 2.5's 1T Parameter MoE Model Runs on 96GB Mac Hardware via SSD Streaming

Mix-and-Match Pruning Framework Reduces Swin-Tiny Accuracy Degradation by 40% vs. Single-Criterion Methods

AgentComm-Bench Exposes Catastrophic Failure Modes in Cooperative Embodied AI Under Real-World Network Conditions

AgenticGEO: Self-Evolving AI Framework for Generative Search Engine Optimization Outperforms 14 Baselines

DST: Domain-Specialized Tree of Thought Cuts Computational Overhead by 26-75% with Plug-and-Play Predictors

DiffGraph: An Agent-Driven Graph Framework for Automated Merging of Online Text-to-Image Expert Models

Context Cartography: Formal Framework Proposes 7 Operators to Govern LLM Context, Moving Beyond 'More Tokens'

arXiv Survey Maps KV Cache Optimization Landscape: 5 Strategies for Million-Token LLM Inference

LLMs Show 'Privileged Access' to Own Policies in Introspect-Bench, Explaining Self-Knowledge via Attention Diffusion

Seed1.8 Model Card Released: A 1.8B Parameter Foundation Model for Generalized Real-World AI Agents

λ-RLM: 8B Parameter Model Using Typed λ-Calculus Beats 405B Performance on Long-Context Tasks