navigation

30 articles about navigation in AI news

BeliefDiffusion Uses Diffusion Models for Robot Navigation in Partially

BeliefDiffusion combines diffusion models with MPC for robot navigation in partially observable environments, outperforming model-free RL and generative baselines in synthetic maps.

Jun 18, 202669% relevant

Stanford's EgoNav Trains Robot Navigation on 5 Hours of Human Video, Enables Zero-Shot Control of Unitree G1

Stanford's EgoNav system uses a 5-hour egocentric video walk of campus to train a diffusion model that enables zero-shot navigation for a Unitree G1 humanoid robot, eliminating the need for robot-specific training data.

Apr 3, 202699% relevant

QAsk-Nav Benchmark Enables Separate Scoring of Navigation and Dialogue for Collaborative AI Agents

A new benchmark called QAsk-Nav enables separate evaluation of navigation and question-asking for collaborative embodied AI agents. The accompanying Light-CoNav model outperforms state-of-the-art methods while being significantly more efficient.

Apr 2, 202675% relevant

How to Cut Claude Code's Token Costs 32% by Fixing Its Navigation Problem

Claude Code agents waste tokens on grep-style navigation. A new open-source tool gives them IDE-like navigation, cutting costs 32% and doubling efficiency.

Mar 24, 202692% relevant

Google Maps Gets an AI Brain: How Gemini Transforms Navigation from Directions to Dialogue

Google is fundamentally reshaping Maps by integrating its Gemini AI, launching 'Ask Maps' for conversational discovery and 'Immersive Navigation' for a complete visual and data-driven route overhaul. This represents a shift from static maps to intelligent, proactive travel assistants.

Mar 12, 202699% relevant

Doby Cuts Claude Code Navigation Tokens by 95% with Spec-First Workflow

A spec-first fix workflow that slashes navigation tokens 95% and enforces plan docs as source of truth before code changes.

Apr 24, 2026100% relevant

Wikipedia Navigation Challenge Exposes Critical Gaps in AI Planning Abilities

Researchers introduce LLM-WikiRace, a benchmark testing how well AI models navigate Wikipedia links between concepts. While top models like Gemini-3 show superhuman performance on easy tasks, success rates plummet to just 23% on hard challenges, revealing fundamental limitations in long-term planning.

Feb 20, 202670% relevant

Dimos OS Launches as Open-Source Robot OS with AI Agent MCP Access

Dimos OS is a new open-source operating system for robots that lets developers write Python modules and gives AI agents direct control via MCP. It includes a full navigation stack and supports hardware like Unitree G1 and DJI drones.

Apr 16, 202699% relevant

New Framework Reveals LLM GUI Agents Don't Navigate Like Humans

Researchers introduced a trace-level framework to compare human and GUI-agent behavior in a production search system. While the agent matched human success rates and query alignment, its navigation was systematically more search-centric and less exploratory. This reveals a critical gap in using agents as user proxies.

Apr 10, 202682% relevant

Google DeepMind Unveils Gemini-Powered Browser That Generates Websites in Real-Time

Google DeepMind has demonstrated a browser prototype powered by Gemini 3.1 Flash-Lite that generates complete HTML/CSS websites dynamically based on user prompts and navigation context, shifting from static page retrieval to on-demand interface generation.

Mar 25, 202695% relevant

Cursor Launches Instant Grep: Millisecond Local Search Across Millions of Files

Cursor has launched Instant Grep, a local search tool that performs millisecond-level regex searches across millions of files. The feature is integrated into the Cursor IDE, targeting developers needing fast, offline code navigation.

Mar 24, 202685% relevant

AgentComm-Bench Exposes Catastrophic Failure Modes in Cooperative Embodied AI Under Real-World Network Conditions

Researchers introduce AgentComm-Bench, a benchmark that stress-tests multi-agent embodied AI systems under six real-world network impairments. It reveals performance drops of over 96% in navigation and 85% in perception F1, highlighting a critical gap between lab evaluations and deployable systems.

Mar 24, 202695% relevant

Claude AI Gains Computer Control Feature: Opens Apps, Navigates Browser, Fills Spreadsheets

Anthropic's Claude AI can now be enabled to directly control a user's computer to perform tasks like opening applications, browser navigation, and spreadsheet work. This represents a significant shift from chat-based interaction to direct system automation.

Mar 23, 202687% relevant

MiRA Framework Boosts Gemma3-12B to 43% Success Rate on WebArena-Lite, Surpassing GPT-4 and WebRL

Researchers propose MiRA, a milestone-based RL framework that improves long-horizon planning in LLM agents. It boosts Gemma3-12B's web navigation success from 6.4% to 43%, outperforming GPT-4-Turbo (17.6%) and the previous SOTA WebRL (38.4%).

Mar 23, 202677% relevant

InterDeepResearch: A New Framework for Human-Agent Collaborative Information Seeking

Researchers propose InterDeepResearch, an interactive system that enables human collaboration with LLM-powered research agents. It addresses limitations of autonomous systems by improving observability, steerability, and context navigation for complex information tasks.

Mar 16, 202676% relevant

Bridging the StarCraft Gap: New AI Benchmark Makes Strategy Research Accessible

Researchers introduce Two-Bridge Map Suite, a lightweight StarCraft II benchmark that isolates tactical skills without full-game complexity. This open-source tool enables reinforcement learning experiments on realistic budgets by focusing on navigation and combat mechanics.

Mar 10, 202675% relevant

FDM-1: The AI That Learned to Use Computers by Watching 11 Million Hours of Screen Recordings

Standard Intelligence has unveiled FDM-1, an AI system trained on 11 million hours of screen recordings that can perform complex computer tasks like CAD design, web navigation, and even simulated driving with minimal fine-tuning.

Feb 24, 202695% relevant

Switchboard's Grid View Gives You Bird's-Eye Control of Claude Code Sessions

Switchboard v0.0.16 adds a grid view that shows all your Claude Code sessions at once with live terminal previews, status indicators, and quick navigation.

Mar 21, 202695% relevant

NVIDIA Drops Fast-FoundationStereo: 10× Faster Depth Estimation

NVIDIA released Fast-FoundationStereo, a real-time foundation model for zero-shot stereo depth estimation that is 10× faster than FoundationStereo with matching accuracy.

Jun 26, 202685% relevant

General Intuition Raises $320M at $2.3B to Train AI on Gameplay Actions

General Intuition raised $320M at $2.3B to train AI on action labels from gameplay, claiming the model generalizes to robots with 8 minutes of fine-tuning.

Jun 25, 2026100% relevant

Gemini 3.5 Flash Scores 78.4 on OSWorld, Matching GPT-5.5

Google integrated Computer Use into Gemini 3.5 Flash, scoring 78.4 on OSWorld — matching GPT-5.5 and undercutting on cost.

Jun 25, 2026100% relevant

Alibaba Launches Qwen Robot Suite, Embodied AI for Unitree Go2

Alibaba launched Qwen Robot Suite, its first embodied AI models for robots, on June 17. The suite targets the Unitree Go2 with a single-camera setup, entering pilot testing with enterprise clients.

Jun 16, 202698% relevant

Huawei HarmonyOS 7 Ships 2,100 System-Level AI Agent Capabilities

Huawei launched HarmonyOS 7 with Xiaoyi as a system-level AI agent exposing 2,100 capabilities, shifting from app-centric to intent-driven interaction.

Jun 14, 202694% relevant

Spirit AI Tops RoboArena, Beats Nvidia and Physical Intelligence

Spirit AI tops RoboArena benchmark at GTC Taipei 2026, beating Nvidia and Physical Intelligence, marking China's rise in embodied AI.

Jun 4, 202690% relevant

Agent Harness Scaling: EFC Predicts Success at R2 0.99 vs 0.42

New research introduces Effective Feedback Compute (EFC), which predicts agent success at R2 0.99 vs 0.42 for raw tokens. Reallocating compute by EFC lifts success 3x at the same budget.

May 29, 202688% relevant

Hybrid A*+RL Agent Beats Pure End-to-End in Unity SR-71 Sim

A hybrid A* + deep RL agent in Unity, trained over 5M PPO steps, switches between classical path planning and learned evasion to navigate an SR-71 through a maze while dodging missiles.

May 16, 202684% relevant

SDAR: Self-Distilled RL Stabilizes Multi-Turn LLM Agents, +9.4% on ALFWorld

SDAR gates self-distillation within GRPO to stabilize multi-turn LLM agent training, yielding +9.4% on ALFWorld and gains on WebShop and Search-QA across Qwen2.5 and Qwen3 models.

May 15, 202685% relevant

Anthropic's Claude Design Reads Your Codebase, Drops Figma Stock 7%

Anthropic launched Claude Design, a visual workspace reading codebases for brand consistency. Figma stock dropped 7% on the announcement.

May 7, 202680% relevant

Ctx2Skill: Self-Play Framework Lets LMs Discover Skills Without Labels

Ctx2Skill discovers skills from context via multi-agent self-play without labels. Outputs plug into any LM, targeting manual prompt engineering bottlenecks.

May 5, 202685% relevant

Japan Deploys Unitree G1 Robots at Haneda Airport Amid Labor Shortage

Japan is testing Unitree G1 and taller humanoid robots at Tokyo Haneda Airport to tackle its labor shortage crisis, marking a real-world deployment of AI-driven robotics.

Apr 29, 202685% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety