perception

30 articles about perception in AI news

Developer Swaps Dash Cam Analysis for Gemma 4 & Falcon Perception

A developer announced they are replacing their entire dash cam video analysis system with Google's Gemma 4 and Falcon Perception models, signaling a practical shift towards newer, specialized multimodal models for real-time edge applications.

Apr 15, 202675% relevant

Gemma4 + Falcon Perception Enables Vision-Action Agent Pipeline

A developer shared a pipeline where Gemma4 interprets images, Falcon Perception segments objects with metadata, and Gemma4 reasons to call tools. This demonstrates a modular approach to vision-language-action agents.

Apr 6, 202685% relevant

Efficient Universal Perception Encoder (EUPE) Family Challenges DINOv2

Researchers introduced the Efficient Universal Perception Encoder (EUPE), a family of compact vision models that achieve performance rivaling the larger DINOv2. This could enable high-quality visual understanding on resource-constrained devices.

Apr 6, 202685% relevant

mlx-vlm v0.4.4 Launches with Falcon-Perception 300M, TurboQuant Metal Kernels & 1.9x Decode Speedup

The mlx-vlm library v0.4.4 adds support for TII's Falcon-Perception 300M vision model and introduces TurboQuant Metal kernels, achieving up to 1.9x faster decoding with 89% KV cache savings on Apple Silicon.

Apr 4, 202685% relevant

World2Agent Open-Sources Protocol for Real-World AI Perception

World2Agent open-sourced a protocol to standardize how AI agents perceive the real world via sensors. No adoption metrics or technical details were disclosed.

May 4, 202685% relevant

Anthropic Survey of 80,508 Users Reveals AI's Dual Perception: Hope for Work & Growth, Fear of Unreliability & Job Loss

Anthropic's global study of 80,508 users finds people simultaneously hold hope and fear about AI. Top hopes center on work improvement and personal growth, while top concerns are unreliability, job loss, and reduced autonomy.

Mar 18, 202687% relevant

Digital Fruit Fly Brain Achieves First Full Perception-Action Loop in Simulation

Startup Eon Systems has demonstrated what appears to be the first complete whole-brain emulation controlling a simulated body. Their digital model of a fruit fly brain, with 125,000 neurons and 50 million synapses, successfully drives realistic behaviors in a physics-simulated fly body.

Mar 8, 202695% relevant

Nvidia Cosmos 3 Unifies Physical AI — Action as Token

Nvidia's Cosmos 3 unifies physical AI perception, simulation, and action in one model via action-as-token. No benchmark data disclosed yet.

Jun 14, 202687% relevant

Meta Tuna-2: Encoder-Free Multimodal Model Beats VAE-Based Rivals

Meta released Tuna-2, an encoder-free multimodal model that understands and generates images from raw pixels. It beats encoder-based models on fine-grained perception benchmarks, challenging the dominant VAE/vision encoder paradigm.

Apr 28, 202690% relevant

Meta's Sapiens2: 1B Human Image ViTs for Pose, Segmentation, Normals

Meta open-sourced Sapiens2 on Hugging Face, a family of vision transformers pretrained on 1 billion human images for pose estimation, segmentation, normal estimation, and point maps. The models target high-resolution human-centric perception.

Apr 23, 202692% relevant

Horizon Launches Full-Stack AI Platform for Autonomous Driving

Horizon Robotics launched a trio of products—a new chip, an open-source OS, and a smart driving system—aiming to push cars closer to becoming autonomous AI agents. The platform integrates hardware and software for enhanced perception and decision-making.

Apr 23, 202682% relevant

GenRobot Launches 6-Camera Wearable for Embodied AI Data Capture

GenRobot launched DAS Ego, a wearable with six 2MP cameras for capturing zero-distortion, 270° FOV data. They also open-sourced the 'Gen Ego Data' dataset covering 200+ skills to train models on perception-action causality.

Apr 21, 202697% relevant

Fei-Fei Li Explains Why 'Open the Top Drawer' Is a Hard AI Problem

AI pioneer Fei-Fei Li breaks down why a simple instruction like 'open the top drawer and watch out for the vase' represents a major unsolved challenge in robotics, requiring robust perception, commonsense reasoning, and efficient learning from sparse rewards.

Apr 19, 202685% relevant

OpenAI Voice Mode Uses Older, Weaker Model, Not GPT-4o

OpenAI's voice mode, which powers its conversational interface, is not powered by the latest GPT-4o model but by a much older and weaker system, creating a disconnect between user perception and technical reality.

Apr 10, 202675% relevant

H&M's Rebound Narrative Fails to Convince Investors Despite Turnaround Efforts

The Business of Fashion reports that H&M, once Sweden's most valuable company, is finding it difficult to convince investors of its comeback story despite implementing turnaround strategies. This reflects the gap between internal progress and external perception in competitive retail.

Apr 7, 202674% relevant

AgentComm-Bench Exposes Catastrophic Failure Modes in Cooperative Embodied AI Under Real-World Network Conditions

Researchers introduce AgentComm-Bench, a benchmark that stress-tests multi-agent embodied AI systems under six real-world network impairments. It reveals performance drops of over 96% in navigation and 85% in perception F1, highlighting a critical gap between lab evaluations and deployable systems.

Mar 24, 202695% relevant

The Next Frontier for Self-Driving Cars: Teaching AI to Think Like a Human

A new survey argues that autonomous driving's biggest hurdle is no longer perception but a lack of robust reasoning. The integration of large language models offers a path forward but creates a critical tension between slow deliberation and split-second safety.

Mar 13, 202681% relevant

Microsoft's Phi-4-Vision: A Compact AI Model That Excels at Math, Science, and Understanding Interfaces

Microsoft has released Phi-4-reasoning-vision-15B, a 15-billion parameter open-weight multimodal model designed for tasks requiring both visual perception and selective reasoning. The compact model excels at scientific, mathematical, and GUI understanding while balancing compute efficiency.

Mar 6, 202685% relevant

Beyond the Black Box: New Framework Tests AI's True Clinical Reasoning on Heart Signals

Researchers have developed a novel framework to evaluate how well multimodal AI models truly reason about ECG signals, separating perception from deduction. This addresses critical gaps in validating AI's clinical logic beyond superficial metrics.

Mar 3, 202675% relevant

EmbodiedAct: How Active AI Agents Are Revolutionizing Scientific Simulation

Researchers have developed EmbodiedAct, a framework that transforms scientific software into active AI agents with real-time perception. This breakthrough addresses critical limitations in how LLMs interact with physical simulations, enabling more reliable scientific discovery through embodied actions.

Feb 25, 202670% relevant

Jensen Huang's AI Productivity Mandate: Engineers Must Spend 50% of Salary on AI Tokens

NVIDIA CEO Jensen Huang argues that a $500K engineer should spend at least $250K annually on AI inference tokens, framing token consumption as essential as CAD tools for chip design. He claims this investment eliminates perceptions of difficulty, time, and resource constraints in development.

Mar 20, 202685% relevant

AgiBot WITA-Omni Scores 85.21 on DailyOmni, Beats Gemini

AgiBot WITA-Omni scores 85.21 on DailyOmni benchmark, beating Google Gemini, ByteDance Doubao, and Alibaba Qwen with a novel Thinker-Talker-Actor architecture.

Jul 30, 2026100% relevant

BYD HyWorldVLA Hits 90.59 PDMS on NAVSIM v1

BYD's HyWorldVLA achieved 90.59 PDMS on NAVSIM v1, a new SOTA, using a hybrid pixel-latent world model. It marks BYD's entry into autonomous driving foundation models.

Jul 29, 2026100% relevant

Microsoft MAI-Cyber-1-Flash Hits 96% on CyberGym

Microsoft's MAI-Cyber-1-Flash scores 96% on CyberGym, cutting costs 50% by handling 90% of security tasks locally while routing complex cases to GPT-5.4.

Jul 27, 2026100% relevant

Alibaba Releases RynnBrain 1.1 Embodied AI Models at 2B-122B Scales

Alibaba released RynnBrain 1.1 on Hugging Face with 2B, 9B, and 122B-A10B MoE models for robot manipulation, but disclosed no benchmarks.

Jul 25, 2026100% relevant

ActiveVision Benchmark: Humans 96.1%, Best AI 10.6%

ActiveVision benchmark: humans 96.1%, best AI 10.6%. The 85.5-point gap reveals fundamental limits in iterative visual reasoning for current models.

Jul 23, 202685% relevant

Most digital shoppers still aren't sold on AI shopping, eMarketer reports

eMarketer reports that most digital shoppers remain unconvinced by AI shopping tools, posing a trust and adoption challenge for retailers investing in the technology.

Jul 17, 202666% relevant

AI cuts ecommerce costs 30%: 3 shifts reshaping online retail in 2026

Digital Commerce 360 reports three AI-driven ecommerce trends for 2026: agentic commerce, hyper-personalization, and automation. Early adopters like Shopify and Walmart see 30% cost cuts and 15-20% conversion boosts.

Jul 16, 202664% relevant

ShamlaTech Launches AI Agent for Shopify

ShamlaTech launched an AI agent for Shopify, WooCommerce, and Magento stores in the U.S., automating customer service, order management, and inventory. This matters as it offers mid-market merchants accessible agentic commerce capabilities.

Jul 13, 2026100% relevant

GenCeption: Video Diffusion Backbone Beats Specialists on 5 Vision Tasks

GenCeption uses video diffusion as a vision backbone, matching specialists with 7-500x less data and generalizing from synthetic to real footage.

Jul 13, 202682% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety