active perception

30 articles about active perception in AI news

EmbodiedAct: How Active AI Agents Are Revolutionizing Scientific Simulation

Researchers have developed EmbodiedAct, a framework that transforms scientific software into active AI agents with real-time perception. This breakthrough addresses critical limitations in how LLMs interact with physical simulations, enabling more reliable scientific discovery through embodied actions.

Feb 25, 202670% relevant

Visual-Seeker: Active Visual Reasoning Beats Proprietary MLLMs on 5 Benchmarks

Visual-Seeker achieves SOTA on five multimodal search benchmarks, surpassing proprietary models by actively harvesting visual evidence during search.

Jun 16, 202672% relevant

mlx-vlm v0.4.4 Launches with Falcon-Perception 300M, TurboQuant Metal Kernels & 1.9x Decode Speedup

The mlx-vlm library v0.4.4 adds support for TII's Falcon-Perception 300M vision model and introduces TurboQuant Metal kernels, achieving up to 1.9x faster decoding with 89% KV cache savings on Apple Silicon.

Apr 4, 202685% relevant

CanViT: First Active-Vision Foundation Model Hits 45.9% mIoU on ADE20K with Sequential Glimpses

Researchers introduce CanViT, the first task- and policy-agnostic Active-Vision Foundation Model (AVFM). It achieves 38.5% mIoU on ADE20K segmentation with a single low-resolution glimpse, outperforming prior active models while using 19.5x fewer FLOPs.

Mar 25, 202691% relevant

Swedish Study: Attractive Female Students' Grade Premium Vanished in Online Classes, Male Premium Persisted

A Swedish university study of 307 students found attractive female students received higher grades in subjective courses during in-person teaching, but this advantage disappeared when classes moved online. The male beauty premium remained, suggesting appearance-based bias in human grading.

Mar 23, 202685% relevant

CogSearch: A Multi-Agent Framework for Proactive Decision Support in E-Commerce Search

Researchers from JD.com introduce CogSearch, a cognitive-aligned multi-agent framework that transforms e-commerce search from passive retrieval to proactive decision support. Offline benchmarks and online A/B tests show significant improvements in conversion, especially for complex queries.

Mar 13, 202699% relevant

MIT's Proactive AI Agents: The Dawn of Autonomous Problem-Solving Systems

MIT researchers have developed proactive AI agents that can autonomously identify and solve problems without human prompting. This breakthrough represents a significant leap from reactive to anticipatory artificial intelligence systems.

Mar 4, 202685% relevant

Horizon Launches Full-Stack AI Platform for Autonomous Driving

Horizon Robotics launched a trio of products—a new chip, an open-source OS, and a smart driving system—aiming to push cars closer to becoming autonomous AI agents. The platform integrates hardware and software for enhanced perception and decision-making.

Apr 23, 202682% relevant

OpenAI Expands Codex into Desktop Agent with Vision & Memory

OpenAI has reportedly expanded its Codex model beyond code generation into a multimodal desktop agent that can see, click, type, and learn user habits. This signals a strategic move from an API tool into a proactive, personalized AI assistant.

Apr 17, 202685% relevant

Unipath Launches Household Robot, Joining China's Push into Consumer Robotics

Chinese company Unipath has launched a household robot. This marks another entry into the competitive consumer robotics market, where Chinese firms are increasingly active.

Mar 30, 202685% relevant

Snap Brings AI Lenses To Luxury Fashion Campaigns

Snapchat is integrating AI-powered augmented reality lenses into luxury fashion marketing campaigns, offering brands a new channel for immersive, interactive advertising directly within the app's ecosystem.

Mar 24, 202688% relevant

The Next Frontier for Self-Driving Cars: Teaching AI to Think Like a Human

A new survey argues that autonomous driving's biggest hurdle is no longer perception but a lack of robust reasoning. The integration of large language models offers a path forward but creates a critical tension between slow deliberation and split-second safety.

Mar 13, 202681% relevant

NVIDIA's Nemotron 3 Super: The Efficiency-First AI Model Redefining Performance Benchmarks

NVIDIA unveils Nemotron 3 Super, a 120B parameter model with only 12B active parameters using hybrid Mamba-Transformer MoE architecture. It achieves 1M token context, beats GPT-OSS-120B on intelligence metrics, and offers configurable reasoning modes for optimal compute efficiency.

Mar 11, 202695% relevant

R1's Real-Time World Model: The Paradigm Shift from Video Generation to World Generation

Rabbit's R1 introduces a real-time world model that continuously generates evolving environments rather than static video frames. This represents a fundamental shift from passive content creation to interactive world simulation, enabling seamless AI interactions without waiting or regeneration cycles.

Feb 26, 202685% relevant

Amazon Launches Generative AI Search Tool That Creates Real-Time Images

Amazon launched a generative AI search tool that creates real-time images from text descriptions to improve product discovery. This leverages Amazon Bedrock and Trainium chips, marking a shift toward AI-driven visual search in e-commerce.

Jun 21, 202672% relevant

SharpAP: New Attack Method Makes Recommender System Poisoning More

Researchers propose SharpAP, a poisoning attack that uses sharpness-aware minimization to generate fake user profiles that transfer better between different recommender system models, posing a more realistic threat.

Apr 27, 202693% relevant

Agent Harnessing: The Infrastructure That Makes AI Agents Work

A detailed technical guide argues that the model is not the hard part of building AI agents. The six-component harness — context management, memory, tools, control flow, verification, and coordination — is what separates production-grade agents from those that fail silently.

Apr 25, 202688% relevant

NVIDIA Open-Sources Motion Diffusion Model for Humanoid Robots

NVIDIA open-sourced Kimono, a motion diffusion model for humanoid robots, trained on 700 hours of motion capture data. It generates 3D human and robot motions from text prompts, supports keyframe and end-effector control, and runs on Unitree G1.

Apr 23, 202685% relevant

Dick's Sporting Goods Partners with Adobe to Launch Agentic AI 'Digital Coaches'

Dick's Sporting Goods announced a partnership with Adobe to implement agentic AI 'digital coaches.' These AI agents will provide personalized guidance to customers, aiming to enhance the shopping experience and drive sales.

Apr 21, 202688% relevant

Google DeepMind Forms 'Strike Team' to Boost AI Coding, Citing Anthropic Pressure

Google has formed a specialized team within DeepMind to rapidly improve its AI coding capabilities. The move is a direct response to internal assessments that Anthropic's tools are more advanced, with leadership pushing for agentic systems.

Apr 20, 2026100% relevant

Catching Drift Before It Catches You

The author details implementing the open-source Evidently AI library to monitor a Kafka-powered movie recommender for data drift. This is a hands-on guide to a fundamental MLOps task for maintaining live AI systems.

Apr 20, 202696% relevant

Geoffrey Hinton: AI Breaks Historical Job Replacement Cycle

AI pioneer Geoffrey Hinton states that unlike past technological revolutions, AI can replace both physical and intellectual labor simultaneously, breaking the historical cycle of job displacement and creation.

Apr 20, 202685% relevant

Google DeepMind Researcher: LLMs Can Never Achieve Consciousness

A Google DeepMind researcher has publicly argued that large language models, by their algorithmic nature, can never become conscious, regardless of scale or time. This stance challenges a core speculative narrative in AI discourse.

Apr 18, 202685% relevant

Unitree H1 Humanoid Robot Shifts from Jog to Run in Seconds

A new video shows Unitree's H1 humanoid robot accelerating from a jogging pace to a running gait in seconds, showcasing improved dynamic locomotion control.

Apr 17, 202685% relevant

Claude Code's Rust TUI Rewrite Eliminates UI Lag

A developer rebuilt Claude Code's terminal UI in Rust to fix performance issues with multiple agents, large diffs, and long tool-call chains—removing frontend friction that was slowing down the experience.

Apr 16, 202685% relevant

Meta Deploys Unified AI Agents to Manage Hyperscale Infrastructure

Meta's engineering team has built and deployed a system of unified AI agents to autonomously manage capacity and performance across its hyperscale infrastructure. This represents a significant shift from rule-based automation to AI-driven orchestration for one of the world's largest computing fleets.

Apr 16, 202670% relevant

Binghamton University Tests Robotic Guide Dog with Natural Language Interface

Researchers at Binghamton University have developed a robotic guide dog prototype that communicates with users using natural language. The system, built on a Unitree Go2 platform, was demonstrated navigating a user through a test environment.

Apr 15, 202685% relevant

Beijing Humanoid Robot Half Marathon Tests 40% Autonomous Teams

A night-time half-marathon test for humanoid robots in Beijing revealed approximately 40% of participating teams were running fully autonomous systems, a key benchmark for real-world robotic mobility.

Apr 15, 202685% relevant

U.K. Retail Loyalty Enters AI Era as M&S

Marks & Spencer, Tesco, and Boots are implementing AI to analyze customer data and deliver hyper-personalized rewards and offers within their loyalty programs. This marks a strategic shift from one-size-fits-all schemes to predictive, individualized engagement to boost retention and spending.

Apr 15, 202684% relevant

Anthropic Opus 4.7, ChatGPT Image 2 Rumored for Imminent Release

Analyst speculation suggests Anthropic's Claude Opus 4.7 and OpenAI's ChatGPT Image 2 could launch imminently, with DeepSeek's expected release next week creating competitive urgency. (199 chars)

Apr 15, 202689% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety