navigation
30 articles about navigation in AI news
BeliefDiffusion Uses Diffusion Models for Robot Navigation in Partially
BeliefDiffusion combines diffusion models with MPC for robot navigation in partially observable environments, outperforming model-free RL and generative baselines in synthetic maps.
Stanford's EgoNav Trains Robot Navigation on 5 Hours of Human Video, Enables Zero-Shot Control of Unitree G1
Stanford's EgoNav system uses a 5-hour egocentric video walk of campus to train a diffusion model that enables zero-shot navigation for a Unitree G1 humanoid robot, eliminating the need for robot-specific training data.
QAsk-Nav Benchmark Enables Separate Scoring of Navigation and Dialogue for Collaborative AI Agents
A new benchmark called QAsk-Nav enables separate evaluation of navigation and question-asking for collaborative embodied AI agents. The accompanying Light-CoNav model outperforms state-of-the-art methods while being significantly more efficient.
How to Cut Claude Code's Token Costs 32% by Fixing Its Navigation Problem
Claude Code agents waste tokens on grep-style navigation. A new open-source tool gives them IDE-like navigation, cutting costs 32% and doubling efficiency.
Google Maps Gets an AI Brain: How Gemini Transforms Navigation from Directions to Dialogue
Google is fundamentally reshaping Maps by integrating its Gemini AI, launching 'Ask Maps' for conversational discovery and 'Immersive Navigation' for a complete visual and data-driven route overhaul. This represents a shift from static maps to intelligent, proactive travel assistants.
Doby Cuts Claude Code Navigation Tokens by 95% with Spec-First Workflow
A spec-first fix workflow that slashes navigation tokens 95% and enforces plan docs as source of truth before code changes.
Wikipedia Navigation Challenge Exposes Critical Gaps in AI Planning Abilities
Researchers introduce LLM-WikiRace, a benchmark testing how well AI models navigate Wikipedia links between concepts. While top models like Gemini-3 show superhuman performance on easy tasks, success rates plummet to just 23% on hard challenges, revealing fundamental limitations in long-term planning.
Dimos OS Launches as Open-Source Robot OS with AI Agent MCP Access
Dimos OS is a new open-source operating system for robots that lets developers write Python modules and gives AI agents direct control via MCP. It includes a full navigation stack and supports hardware like Unitree G1 and DJI drones.
New Framework Reveals LLM GUI Agents Don't Navigate Like Humans
Researchers introduced a trace-level framework to compare human and GUI-agent behavior in a production search system. While the agent matched human success rates and query alignment, its navigation was systematically more search-centric and less exploratory. This reveals a critical gap in using agents as user proxies.
Google DeepMind Unveils Gemini-Powered Browser That Generates Websites in Real-Time
Google DeepMind has demonstrated a browser prototype powered by Gemini 3.1 Flash-Lite that generates complete HTML/CSS websites dynamically based on user prompts and navigation context, shifting from static page retrieval to on-demand interface generation.
Cursor Launches Instant Grep: Millisecond Local Search Across Millions of Files
Cursor has launched Instant Grep, a local search tool that performs millisecond-level regex searches across millions of files. The feature is integrated into the Cursor IDE, targeting developers needing fast, offline code navigation.
AgentComm-Bench Exposes Catastrophic Failure Modes in Cooperative Embodied AI Under Real-World Network Conditions
Researchers introduce AgentComm-Bench, a benchmark that stress-tests multi-agent embodied AI systems under six real-world network impairments. It reveals performance drops of over 96% in navigation and 85% in perception F1, highlighting a critical gap between lab evaluations and deployable systems.
Claude AI Gains Computer Control Feature: Opens Apps, Navigates Browser, Fills Spreadsheets
Anthropic's Claude AI can now be enabled to directly control a user's computer to perform tasks like opening applications, browser navigation, and spreadsheet work. This represents a significant shift from chat-based interaction to direct system automation.
MiRA Framework Boosts Gemma3-12B to 43% Success Rate on WebArena-Lite, Surpassing GPT-4 and WebRL
Researchers propose MiRA, a milestone-based RL framework that improves long-horizon planning in LLM agents. It boosts Gemma3-12B's web navigation success from 6.4% to 43%, outperforming GPT-4-Turbo (17.6%) and the previous SOTA WebRL (38.4%).
InterDeepResearch: A New Framework for Human-Agent Collaborative Information Seeking
Researchers propose InterDeepResearch, an interactive system that enables human collaboration with LLM-powered research agents. It addresses limitations of autonomous systems by improving observability, steerability, and context navigation for complex information tasks.
Bridging the StarCraft Gap: New AI Benchmark Makes Strategy Research Accessible
Researchers introduce Two-Bridge Map Suite, a lightweight StarCraft II benchmark that isolates tactical skills without full-game complexity. This open-source tool enables reinforcement learning experiments on realistic budgets by focusing on navigation and combat mechanics.
FDM-1: The AI That Learned to Use Computers by Watching 11 Million Hours of Screen Recordings
Standard Intelligence has unveiled FDM-1, an AI system trained on 11 million hours of screen recordings that can perform complex computer tasks like CAD design, web navigation, and even simulated driving with minimal fine-tuning.
Switchboard's Grid View Gives You Bird's-Eye Control of Claude Code Sessions
Switchboard v0.0.16 adds a grid view that shows all your Claude Code sessions at once with live terminal previews, status indicators, and quick navigation.
NVIDIA Drops Fast-FoundationStereo: 10× Faster Depth Estimation
NVIDIA released Fast-FoundationStereo, a real-time foundation model for zero-shot stereo depth estimation that is 10× faster than FoundationStereo with matching accuracy.
General Intuition Raises $320M at $2.3B to Train AI on Gameplay Actions
General Intuition raised $320M at $2.3B to train AI on action labels from gameplay, claiming the model generalizes to robots with 8 minutes of fine-tuning.
Gemini 3.5 Flash Scores 78.4 on OSWorld, Matching GPT-5.5
Google integrated Computer Use into Gemini 3.5 Flash, scoring 78.4 on OSWorld — matching GPT-5.5 and undercutting on cost.
Alibaba Launches Qwen Robot Suite, Embodied AI for Unitree Go2
Alibaba launched Qwen Robot Suite, its first embodied AI models for robots, on June 17. The suite targets the Unitree Go2 with a single-camera setup, entering pilot testing with enterprise clients.
Huawei HarmonyOS 7 Ships 2,100 System-Level AI Agent Capabilities
Huawei launched HarmonyOS 7 with Xiaoyi as a system-level AI agent exposing 2,100 capabilities, shifting from app-centric to intent-driven interaction.
Spirit AI Tops RoboArena, Beats Nvidia and Physical Intelligence
Spirit AI tops RoboArena benchmark at GTC Taipei 2026, beating Nvidia and Physical Intelligence, marking China's rise in embodied AI.
Agent Harness Scaling: EFC Predicts Success at R2 0.99 vs 0.42
New research introduces Effective Feedback Compute (EFC), which predicts agent success at R2 0.99 vs 0.42 for raw tokens. Reallocating compute by EFC lifts success 3x at the same budget.
Hybrid A*+RL Agent Beats Pure End-to-End in Unity SR-71 Sim
A hybrid A* + deep RL agent in Unity, trained over 5M PPO steps, switches between classical path planning and learned evasion to navigate an SR-71 through a maze while dodging missiles.
SDAR: Self-Distilled RL Stabilizes Multi-Turn LLM Agents, +9.4% on ALFWorld
SDAR gates self-distillation within GRPO to stabilize multi-turn LLM agent training, yielding +9.4% on ALFWorld and gains on WebShop and Search-QA across Qwen2.5 and Qwen3 models.
Anthropic's Claude Design Reads Your Codebase, Drops Figma Stock 7%
Anthropic launched Claude Design, a visual workspace reading codebases for brand consistency. Figma stock dropped 7% on the announcement.
Ctx2Skill: Self-Play Framework Lets LMs Discover Skills Without Labels
Ctx2Skill discovers skills from context via multi-agent self-play without labels. Outputs plug into any LM, targeting manual prompt engineering bottlenecks.
Japan Deploys Unitree G1 Robots at Haneda Airport Amid Labor Shortage
Japan is testing Unitree G1 and taller humanoid robots at Tokyo Haneda Airport to tackle its labor shortage crisis, marking a real-world deployment of AI-driven robotics.