perception
30 articles about perception in AI news
Developer Swaps Dash Cam Analysis for Gemma 4 & Falcon Perception
A developer announced they are replacing their entire dash cam video analysis system with Google's Gemma 4 and Falcon Perception models, signaling a practical shift towards newer, specialized multimodal models for real-time edge applications.
Gemma4 + Falcon Perception Enables Vision-Action Agent Pipeline
A developer shared a pipeline where Gemma4 interprets images, Falcon Perception segments objects with metadata, and Gemma4 reasons to call tools. This demonstrates a modular approach to vision-language-action agents.
Efficient Universal Perception Encoder (EUPE) Family Challenges DINOv2
Researchers introduced the Efficient Universal Perception Encoder (EUPE), a family of compact vision models that achieve performance rivaling the larger DINOv2. This could enable high-quality visual understanding on resource-constrained devices.
mlx-vlm v0.4.4 Launches with Falcon-Perception 300M, TurboQuant Metal Kernels & 1.9x Decode Speedup
The mlx-vlm library v0.4.4 adds support for TII's Falcon-Perception 300M vision model and introduces TurboQuant Metal kernels, achieving up to 1.9x faster decoding with 89% KV cache savings on Apple Silicon.
World2Agent Open-Sources Protocol for Real-World AI Perception
World2Agent open-sourced a protocol to standardize how AI agents perceive the real world via sensors. No adoption metrics or technical details were disclosed.
Anthropic Survey of 80,508 Users Reveals AI's Dual Perception: Hope for Work & Growth, Fear of Unreliability & Job Loss
Anthropic's global study of 80,508 users finds people simultaneously hold hope and fear about AI. Top hopes center on work improvement and personal growth, while top concerns are unreliability, job loss, and reduced autonomy.
Digital Fruit Fly Brain Achieves First Full Perception-Action Loop in Simulation
Startup Eon Systems has demonstrated what appears to be the first complete whole-brain emulation controlling a simulated body. Their digital model of a fruit fly brain, with 125,000 neurons and 50 million synapses, successfully drives realistic behaviors in a physics-simulated fly body.
Nvidia Cosmos 3 Unifies Physical AI — Action as Token
Nvidia's Cosmos 3 unifies physical AI perception, simulation, and action in one model via action-as-token. No benchmark data disclosed yet.
Meta Tuna-2: Encoder-Free Multimodal Model Beats VAE-Based Rivals
Meta released Tuna-2, an encoder-free multimodal model that understands and generates images from raw pixels. It beats encoder-based models on fine-grained perception benchmarks, challenging the dominant VAE/vision encoder paradigm.
Meta's Sapiens2: 1B Human Image ViTs for Pose, Segmentation, Normals
Meta open-sourced Sapiens2 on Hugging Face, a family of vision transformers pretrained on 1 billion human images for pose estimation, segmentation, normal estimation, and point maps. The models target high-resolution human-centric perception.
Horizon Launches Full-Stack AI Platform for Autonomous Driving
Horizon Robotics launched a trio of products—a new chip, an open-source OS, and a smart driving system—aiming to push cars closer to becoming autonomous AI agents. The platform integrates hardware and software for enhanced perception and decision-making.
GenRobot Launches 6-Camera Wearable for Embodied AI Data Capture
GenRobot launched DAS Ego, a wearable with six 2MP cameras for capturing zero-distortion, 270° FOV data. They also open-sourced the 'Gen Ego Data' dataset covering 200+ skills to train models on perception-action causality.
Fei-Fei Li Explains Why 'Open the Top Drawer' Is a Hard AI Problem
AI pioneer Fei-Fei Li breaks down why a simple instruction like 'open the top drawer and watch out for the vase' represents a major unsolved challenge in robotics, requiring robust perception, commonsense reasoning, and efficient learning from sparse rewards.
OpenAI Voice Mode Uses Older, Weaker Model, Not GPT-4o
OpenAI's voice mode, which powers its conversational interface, is not powered by the latest GPT-4o model but by a much older and weaker system, creating a disconnect between user perception and technical reality.
H&M's Rebound Narrative Fails to Convince Investors Despite Turnaround Efforts
The Business of Fashion reports that H&M, once Sweden's most valuable company, is finding it difficult to convince investors of its comeback story despite implementing turnaround strategies. This reflects the gap between internal progress and external perception in competitive retail.
AgentComm-Bench Exposes Catastrophic Failure Modes in Cooperative Embodied AI Under Real-World Network Conditions
Researchers introduce AgentComm-Bench, a benchmark that stress-tests multi-agent embodied AI systems under six real-world network impairments. It reveals performance drops of over 96% in navigation and 85% in perception F1, highlighting a critical gap between lab evaluations and deployable systems.
The Next Frontier for Self-Driving Cars: Teaching AI to Think Like a Human
A new survey argues that autonomous driving's biggest hurdle is no longer perception but a lack of robust reasoning. The integration of large language models offers a path forward but creates a critical tension between slow deliberation and split-second safety.
Microsoft's Phi-4-Vision: A Compact AI Model That Excels at Math, Science, and Understanding Interfaces
Microsoft has released Phi-4-reasoning-vision-15B, a 15-billion parameter open-weight multimodal model designed for tasks requiring both visual perception and selective reasoning. The compact model excels at scientific, mathematical, and GUI understanding while balancing compute efficiency.
Beyond the Black Box: New Framework Tests AI's True Clinical Reasoning on Heart Signals
Researchers have developed a novel framework to evaluate how well multimodal AI models truly reason about ECG signals, separating perception from deduction. This addresses critical gaps in validating AI's clinical logic beyond superficial metrics.
EmbodiedAct: How Active AI Agents Are Revolutionizing Scientific Simulation
Researchers have developed EmbodiedAct, a framework that transforms scientific software into active AI agents with real-time perception. This breakthrough addresses critical limitations in how LLMs interact with physical simulations, enabling more reliable scientific discovery through embodied actions.
Jensen Huang's AI Productivity Mandate: Engineers Must Spend 50% of Salary on AI Tokens
NVIDIA CEO Jensen Huang argues that a $500K engineer should spend at least $250K annually on AI inference tokens, framing token consumption as essential as CAD tools for chip design. He claims this investment eliminates perceptions of difficulty, time, and resource constraints in development.
Impact Analytics Wins 'Demand Forecasting Solution of the Year' for Second
Impact Analytics secured the 2026 'Demand Forecasting Solution of the Year' award from SupplyTech Breakthrough, marking its second straight win. The recognition highlights AI's growing role in retail inventory and pricing optimization.
SVoT Boosts MLLM Spatial Reasoning by 65% via RL-Verified Visual Chains
SVoT uses RL to verify MLLM spatial reasoning states, achieving up to 65% accuracy gains on OOD tests across five domains including Pacman and Gather.
PRS 2026: Netflix Workshop Reveals Industry Shift to LLM-Powered
Netflix's 2026 PRS workshop featured DoorDash, LinkedIn, Pinterest, Google DeepMind, and Stanford, showcasing how LLMs are transforming personalization, recommendation, and search. The event underscored the industry's shift toward integrating large language models into core recommendation pipelines.
MiniCPM-o 4.5 Ships Full-Duplex Omni-Modal AI at 9B Parameters
OpenBMB's MiniCPM-o 4.5 is a 9B open model with full-duplex omni-modal interaction, outperforming Qwen3-Omni-30B-A3B and running under 12GB RAM.
Thinking Machines Unveils Native Multimodal Interaction Model
Thinking Machines unveiled a native interaction model that simultaneously listens, sees, speaks, interrupts, reacts, thinks in background, and uses tools. The approach targets the fundamental turn-based bottleneck of current AI assistants.
Claude Code's Six-Layer Architecture: Harness, Not Magic
Claude Code's six-layer architecture uses a 3-layer context compressor at 92% threshold and Redis-based multi-agent FSM protocol. The model is just one node in a harness.
Wireless Brain Implant Restores Sight in Third Human Patient
Wireless brain implant with 544 electrodes achieves third human implantation, bypassing eyes to create artificial sight via direct visual cortex stimulation.
New Thesis Exposes Critical Flaws in Recommender System Fairness Metrics —
This thesis systematically analyzes offline fairness evaluation measures for recommender systems, revealing flaws in interpretability, expressiveness, and applicability. It proposes novel evaluation approaches and practical guidelines for selecting appropriate measures, directly addressing the confusion caused by un-validated metrics.
Claude Code Regression: How to Diagnose and Fix the Recent Quality Drop
Anthropic's postmortem reveals three regressions in Claude Code: reasoning effort, context retention, and verbosity changes. Here's how to diagnose and fix them.