active perception
30 articles about active perception in AI news
EmbodiedAct: How Active AI Agents Are Revolutionizing Scientific Simulation
Researchers have developed EmbodiedAct, a framework that transforms scientific software into active AI agents with real-time perception. This breakthrough addresses critical limitations in how LLMs interact with physical simulations, enabling more reliable scientific discovery through embodied actions.
mlx-vlm v0.4.4 Launches with Falcon-Perception 300M, TurboQuant Metal Kernels & 1.9x Decode Speedup
The mlx-vlm library v0.4.4 adds support for TII's Falcon-Perception 300M vision model and introduces TurboQuant Metal kernels, achieving up to 1.9x faster decoding with 89% KV cache savings on Apple Silicon.
CanViT: First Active-Vision Foundation Model Hits 45.9% mIoU on ADE20K with Sequential Glimpses
Researchers introduce CanViT, the first task- and policy-agnostic Active-Vision Foundation Model (AVFM). It achieves 38.5% mIoU on ADE20K segmentation with a single low-resolution glimpse, outperforming prior active models while using 19.5x fewer FLOPs.
Swedish Study: Attractive Female Students' Grade Premium Vanished in Online Classes, Male Premium Persisted
A Swedish university study of 307 students found attractive female students received higher grades in subjective courses during in-person teaching, but this advantage disappeared when classes moved online. The male beauty premium remained, suggesting appearance-based bias in human grading.
CogSearch: A Multi-Agent Framework for Proactive Decision Support in E-Commerce Search
Researchers from JD.com introduce CogSearch, a cognitive-aligned multi-agent framework that transforms e-commerce search from passive retrieval to proactive decision support. Offline benchmarks and online A/B tests show significant improvements in conversion, especially for complex queries.
MIT's Proactive AI Agents: The Dawn of Autonomous Problem-Solving Systems
MIT researchers have developed proactive AI agents that can autonomously identify and solve problems without human prompting. This breakthrough represents a significant leap from reactive to anticipatory artificial intelligence systems.
Horizon Launches Full-Stack AI Platform for Autonomous Driving
Horizon Robotics launched a trio of products—a new chip, an open-source OS, and a smart driving system—aiming to push cars closer to becoming autonomous AI agents. The platform integrates hardware and software for enhanced perception and decision-making.
OpenAI Expands Codex into Desktop Agent with Vision & Memory
OpenAI has reportedly expanded its Codex model beyond code generation into a multimodal desktop agent that can see, click, type, and learn user habits. This signals a strategic move from an API tool into a proactive, personalized AI assistant.
Unipath Launches Household Robot, Joining China's Push into Consumer Robotics
Chinese company Unipath has launched a household robot. This marks another entry into the competitive consumer robotics market, where Chinese firms are increasingly active.
Snap Brings AI Lenses To Luxury Fashion Campaigns
Snapchat is integrating AI-powered augmented reality lenses into luxury fashion marketing campaigns, offering brands a new channel for immersive, interactive advertising directly within the app's ecosystem.
The Next Frontier for Self-Driving Cars: Teaching AI to Think Like a Human
A new survey argues that autonomous driving's biggest hurdle is no longer perception but a lack of robust reasoning. The integration of large language models offers a path forward but creates a critical tension between slow deliberation and split-second safety.
NVIDIA's Nemotron 3 Super: The Efficiency-First AI Model Redefining Performance Benchmarks
NVIDIA unveils Nemotron 3 Super, a 120B parameter model with only 12B active parameters using hybrid Mamba-Transformer MoE architecture. It achieves 1M token context, beats GPT-OSS-120B on intelligence metrics, and offers configurable reasoning modes for optimal compute efficiency.
R1's Real-Time World Model: The Paradigm Shift from Video Generation to World Generation
Rabbit's R1 introduces a real-time world model that continuously generates evolving environments rather than static video frames. This represents a fundamental shift from passive content creation to interactive world simulation, enabling seamless AI interactions without waiting or regeneration cycles.
SharpAP: New Attack Method Makes Recommender System Poisoning More
Researchers propose SharpAP, a poisoning attack that uses sharpness-aware minimization to generate fake user profiles that transfer better between different recommender system models, posing a more realistic threat.
Agent Harnessing: The Infrastructure That Makes AI Agents Work
A detailed technical guide argues that the model is not the hard part of building AI agents. The six-component harness — context management, memory, tools, control flow, verification, and coordination — is what separates production-grade agents from those that fail silently.
NVIDIA Open-Sources Motion Diffusion Model for Humanoid Robots
NVIDIA open-sourced Kimono, a motion diffusion model for humanoid robots, trained on 700 hours of motion capture data. It generates 3D human and robot motions from text prompts, supports keyframe and end-effector control, and runs on Unitree G1.
Dick's Sporting Goods Partners with Adobe to Launch Agentic AI 'Digital Coaches'
Dick's Sporting Goods announced a partnership with Adobe to implement agentic AI 'digital coaches.' These AI agents will provide personalized guidance to customers, aiming to enhance the shopping experience and drive sales.
Google DeepMind Forms 'Strike Team' to Boost AI Coding, Citing Anthropic Pressure
Google has formed a specialized team within DeepMind to rapidly improve its AI coding capabilities. The move is a direct response to internal assessments that Anthropic's tools are more advanced, with leadership pushing for agentic systems.
Catching Drift Before It Catches You
The author details implementing the open-source Evidently AI library to monitor a Kafka-powered movie recommender for data drift. This is a hands-on guide to a fundamental MLOps task for maintaining live AI systems.
Geoffrey Hinton: AI Breaks Historical Job Replacement Cycle
AI pioneer Geoffrey Hinton states that unlike past technological revolutions, AI can replace both physical and intellectual labor simultaneously, breaking the historical cycle of job displacement and creation.
Google DeepMind Researcher: LLMs Can Never Achieve Consciousness
A Google DeepMind researcher has publicly argued that large language models, by their algorithmic nature, can never become conscious, regardless of scale or time. This stance challenges a core speculative narrative in AI discourse.
Unitree H1 Humanoid Robot Shifts from Jog to Run in Seconds
A new video shows Unitree's H1 humanoid robot accelerating from a jogging pace to a running gait in seconds, showcasing improved dynamic locomotion control.
Claude Code's Rust TUI Rewrite Eliminates UI Lag
A developer rebuilt Claude Code's terminal UI in Rust to fix performance issues with multiple agents, large diffs, and long tool-call chains—removing frontend friction that was slowing down the experience.
Meta Deploys Unified AI Agents to Manage Hyperscale Infrastructure
Meta's engineering team has built and deployed a system of unified AI agents to autonomously manage capacity and performance across its hyperscale infrastructure. This represents a significant shift from rule-based automation to AI-driven orchestration for one of the world's largest computing fleets.
Binghamton University Tests Robotic Guide Dog with Natural Language Interface
Researchers at Binghamton University have developed a robotic guide dog prototype that communicates with users using natural language. The system, built on a Unitree Go2 platform, was demonstrated navigating a user through a test environment.
Beijing Humanoid Robot Half Marathon Tests 40% Autonomous Teams
A night-time half-marathon test for humanoid robots in Beijing revealed approximately 40% of participating teams were running fully autonomous systems, a key benchmark for real-world robotic mobility.
U.K. Retail Loyalty Enters AI Era as M&S
Marks & Spencer, Tesco, and Boots are implementing AI to analyze customer data and deliver hyper-personalized rewards and offers within their loyalty programs. This marks a strategic shift from one-size-fits-all schemes to predictive, individualized engagement to boost retention and spending.
Anthropic Opus 4.7, ChatGPT Image 2 Rumored for Imminent Release
Analyst speculation suggests Anthropic's Claude Opus 4.7 and OpenAI's ChatGPT Image 2 could launch imminently, with DeepSeek's expected release next week creating competitive urgency. (199 chars)
AGIBOT Launches GE-Sim 2.0: A Foundation Model for Robot Simulation
AGIBOT has launched GE-Sim 2.0, a foundation model for robot simulation. It allows AI agents to generate and reason within photorealistic simulated environments for planning and training.
Agentic AI in Retail: Experts Warn Against Shifting Liability to Consumers
Industry experts warn that the rush to implement agentic AI in retail carries significant risk. If brands attempt to shift liability for AI mistakes onto customers, they could erode hard-won consumer trust and face increased regulatory scrutiny.