3d perception
30 articles about 3d perception in AI news
AllenAI's WildDet3D Enables Promptable 3D Object Detection from Single Images
Allen Institute for AI (AllenAI) has open-sourced WildDet3D, a model for promptable 3D object detection from single RGB images. It predicts 3D bounding boxes using flexible prompts and can integrate optional depth data.
Momentum-Consistency Fine-Tuning (MCFT) Achieves 3.30% Gain in 5-Shot 3D Vision Tasks Without Adapters
Researchers propose MCFT, an adapter-free fine-tuning method for 3D point cloud models that selectively updates encoder parameters with momentum constraints. It outperforms prior methods by 3.30% in 5-shot settings and maintains original inference latency.
VAST's $50M Funding Signals 3D AI Revolution: From Foundation Models to World Simulation
AI startup VAST has secured $50 million in Series A funding while advancing its 3D foundation models that are setting new industry standards. The company is preparing to launch its first world model, positioning itself at the forefront of spatial AI development.
Utonia AI Breakthrough: A Single Transformer Model Unifies All 3D Point Cloud Data
Researchers have developed Utonia, a single self-supervised transformer that learns unified 3D representations across diverse point cloud data types including LiDAR, CAD models, indoor scans, and video-lifted data. This breakthrough enables unprecedented cross-domain transfer and emergent behaviors in 3D AI.
BetterScene Bridges the Gap: How Aligning AI Representations Unlocks Photorealistic 3D Synthesis
Researchers introduce BetterScene, a novel AI method that dramatically improves 3D scene generation from just a handful of photos. By aligning the internal representations of a powerful video diffusion model, it produces consistent, artifact-free novel views, pushing the boundary of what's possible in computational photography and virtual world creation.
CLIPoint3D Bridges the 3D Reality Gap: How Language Models Are Revolutionizing Point Cloud Adaptation
Researchers have developed CLIPoint3D, a novel framework that leverages frozen CLIP backbones for few-shot unsupervised 3D point cloud domain adaptation. The approach achieves 3-16% accuracy gains over conventional methods while dramatically improving efficiency by avoiding heavy trainable encoders.
JAEGER Breaks the 2D Barrier: How 3D Audio-Visual AI Could Transform Robotics and AR
Researchers introduce JAEGER, a framework that extends audio-visual large language models into 3D space using RGB-D and spatial audio. This breakthrough enables AI to understand and reason about physical environments with unprecedented spatial awareness.
Radar Meets AI: How RF Signals Are Revolutionizing 3D Scene Reconstruction
Researchers have developed a multimodal approach combining radio-frequency sensing with Gaussian Splatting to create robust 3D scene rendering that works in challenging conditions where vision alone fails. This breakthrough enables high-fidelity reconstruction in adverse weather, low light, and through occlusions.
Meta's Sapiens2: 1B Human Image ViTs for Pose, Segmentation, Normals
Meta open-sourced Sapiens2 on Hugging Face, a family of vision transformers pretrained on 1 billion human images for pose estimation, segmentation, normal estimation, and point maps. The models target high-resolution human-centric perception.
NVIDIA Open-Sources Motion Diffusion Model for Humanoid Robots
NVIDIA open-sourced Kimono, a motion diffusion model for humanoid robots, trained on 700 hours of motion capture data. It generates 3D human and robot motions from text prompts, supports keyframe and end-effector control, and runs on Unitree G1.
GenRobot Launches 6-Camera Wearable for Embodied AI Data Capture
GenRobot launched DAS Ego, a wearable with six 2MP cameras for capturing zero-distortion, 270° FOV data. They also open-sourced the 'Gen Ego Data' dataset covering 200+ skills to train models on perception-action causality.
Fei-Fei Li Explains Why 'Open the Top Drawer' Is a Hard AI Problem
AI pioneer Fei-Fei Li breaks down why a simple instruction like 'open the top drawer and watch out for the vase' represents a major unsolved challenge in robotics, requiring robust perception, commonsense reasoning, and efficient learning from sparse rewards.
FORGE Benchmark Reveals Domain Knowledge
Researchers introduced FORGE, a multimodal dataset with 2D/3D data and fine-grained annotations for manufacturing. Evaluating 18 MLLMs revealed domain knowledge, not visual grounding, is the key bottleneck, with fine-tuning offering a clear path forward.
MIT Researchers Develop GelSight Svelte Tactile Sensor for Delicate, Heavy Object Manipulation
MIT's CSAIL introduced GelSight Svelte, a thin, high-resolution tactile sensor enabling robots to grip delicate, heavy objects by measuring shear forces. The sensor uses two cameras and colored LEDs to track gel deformation, providing detailed 3D force maps for precise manipulation.
Impact Analytics Wins 'Demand Forecasting Solution of the Year' for Second
Impact Analytics secured the 2026 'Demand Forecasting Solution of the Year' award from SupplyTech Breakthrough, marking its second straight win. The recognition highlights AI's growing role in retail inventory and pricing optimization.
SVoT Boosts MLLM Spatial Reasoning by 65% via RL-Verified Visual Chains
SVoT uses RL to verify MLLM spatial reasoning states, achieving up to 65% accuracy gains on OOD tests across five domains including Pacman and Gather.
Catching Drift Before It Catches You
The author details implementing the open-source Evidently AI library to monitor a Kafka-powered movie recommender for data drift. This is a hands-on guide to a fundamental MLOps task for maintaining live AI systems.
OpenAI Expands Codex into Desktop Agent with Vision & Memory
OpenAI has reportedly expanded its Codex model beyond code generation into a multimodal desktop agent that can see, click, type, and learn user habits. This signals a strategic move from an API tool into a proactive, personalized AI assistant.
Ray Kurzweil Predicts AI Consciousness Acceptance by 2026
Futurist Ray Kurzweil predicts AI will soon exhibit all signs of consciousness, leading to widespread acceptance. This is expected to drive a major resurgence of philosophical debates on consciousness and humanity in 2026.
AGIBOT Launches GE-Sim 2.0: A Foundation Model for Robot Simulation
AGIBOT has launched GE-Sim 2.0, a foundation model for robot simulation. It allows AI agents to generate and reason within photorealistic simulated environments for planning and training.
AI-Powered Drone De-Ices Power Lines in Sub-Zero Fog
A drone system autonomously navigates thick fog and snow to de-ice high-voltage power lines. This removes the need for hazardous manual crew climbs, improving grid reliability and safety.
Virtual Try-on of New Clothes Through AI - Unite.AI
The source is a news article from Unite.AI discussing AI-driven virtual try-on technology for clothing. This is a direct application for the retail and luxury sector, aiming to enhance online shopping experiences.
Token Warping for MLLMs Outperforms Pixel Methods in View Synthesis
Researchers propose warping image tokens instead of pixels for multi-view reasoning in MLLMs. The zero-shot method is robust to depth noise and outperforms established baselines.
NVIDIA Spotlights Physical AI Tools for Robotics Week 2026
NVIDIA is highlighting its platforms for robot simulation, synthetic data, and AI-powered learning during National Robotics Week 2026, aiming to accelerate the transition from virtual training to physical deployment.
Generative World Renderer: 4M+ RGB/G-Buffer Frames from Cyberpunk 2077 & Black Myth: Wukong Released for Inverse Graphics
A new framework and dataset extracts over 4 million synchronized RGB and G-buffer frames from Cyberpunk 2077 and Black Myth: Wukong, enabling AI models to learn inverse material decomposition and controllable game environment editing.
Atlanta Startup Deploys AI-Powered Robot Dogs for Nighttime Neighborhood Security
A U.S. startup based in Atlanta is deploying quadrupedal robots for autonomous nighttime neighborhood patrols. The units are designed to detect intruders and alert residents, representing a commercial pivot for legged robotics.
Elon Musk Claims Tesla Optimus Will Surpass Human Surgeons by 2029, Advises Against Medical School
Elon Musk stated Tesla's Optimus humanoid robot will outperform any human surgeon at scale within three years, calling medical school 'pointless.' He predicts universal access to superior medical care within five years.
Facebook's SAM 3 Vision Model Ported to Apple's MLX Framework, Enabling Real-Time Tracking on M3 Max
Facebook's Segment Anything Model 3 (SAM 3) has been ported to Apple's MLX framework, enabling real-time object tracking on an M3 Max MacBook Pro. This demonstrates efficient on-device execution of a foundational vision model without cloud dependency.
Fei-Fei Li Argues Spatial Intelligence is the 'Other Half' of AI Beyond Language
AI pioneer Dr. Fei-Fei Li states that true intelligence requires spatial understanding alongside language. This perspective directly challenges the current LLM-centric paradigm.
Ego2Web Benchmark Bridges Egocentric Video and Web Agents, Exposing Major Performance Gaps
Researchers introduce Ego2Web, the first benchmark requiring AI agents to understand real-world first-person video and execute related web tasks. Their novel Ego2WebJudge evaluation method achieves 84% human agreement, while state-of-the-art agents perform poorly across all task categories.