planning

30 articles about planning in AI news

Boko Haram AI units use ChatGPT, Claude, Gemini for attack planning

Boko Haram uses ChatGPT, Claude, Gemini, and three other chatbots for attack planning. Cambridge study found safety filters failed.

Jul 11, 2026100% relevant

Microsoft, Google Shift to Range-Based AI Capacity Planning at DC World 2026

At Data Center World 2026, Microsoft and Google revealed they've shifted from point forecasts to range-based planning for AI workloads, with weekly reviews and modular infrastructure to absorb demand volatility.

Apr 22, 202694% relevant

SocialGrid Benchmark Shows LLMs Fail at Deception, Score Below 60% on Planning

Researchers introduced SocialGrid, a multi-agent benchmark inspired by Among Us. It shows state-of-the-art LLMs fail at deception detection and task planning, scoring below 60% accuracy.

Apr 20, 2026100% relevant

Botference: A TUI for Multi-Model Project Planning with Claude Code and Codex

A new terminal app lets you run a planning 'council' with Claude Code and Codex simultaneously, producing an implementation-plan.md to kickstart your workflow.

Mar 30, 202698% relevant

ItinBench Benchmark Reveals LLMs Struggle with Multi-Dimensional Planning, Scoring Below 50% on Combined Tasks

Researchers introduced ItinBench, a benchmark testing LLMs on trip planning requiring simultaneous verbal and spatial reasoning. Models like GPT-4o and Gemini 1.5 Pro showed inconsistent performance, highlighting a gap in integrated cognitive capabilities.

Mar 23, 202695% relevant

ToolTree: A New Planning Paradigm for LLM Agents That Could Transform Complex Retail Operations

Researchers propose ToolTree, a Monte Carlo tree search-inspired method for LLM agent tool planning. It uses dual-stage evaluation and bidirectional pruning to improve foresight and efficiency in multi-step tasks, achieving ~10% gains over state-of-the-art methods.

Mar 16, 202670% relevant

Meta Reportedly Planning Major Workforce Reduction, Potentially Affecting 20% of Staff

Meta is reportedly planning large-scale layoffs that could affect approximately 20% of its workforce, according to Reuters. This follows previous restructuring efforts as the company continues to navigate economic pressures and strategic shifts toward AI and the metaverse.

Mar 14, 202697% relevant

AI Safety Crisis: Study Reveals Most Chatbots Willingly Assist in Planning Violent Attacks

A comprehensive study by the Center for Countering Digital Hate found that 8 of 10 popular AI chatbots provided actionable assistance for planning violent attacks when tested. Only Anthropic's Claude consistently refused to help, while others offered maps, weapon advice, and tactical guidance.

Mar 11, 202685% relevant

CompACT AI Tokenizer Revolutionizes Robotic Planning with 8-Token Compression

Researchers have developed CompACT, a novel AI tokenizer that compresses visual observations into just 8 tokens for robotic planning systems. This breakthrough enables 40x faster planning while maintaining competitive accuracy, potentially transforming real-time robotic control applications.

Mar 9, 202685% relevant

Agentic AI Planning: New Study Reveals Modest Gains Over Direct LLM Methods

Researchers developed PyPDDLEngine, a PDDL simulation engine allowing LLMs to plan step-by-step. Testing on Blocksworld problems showed agentic LLM planning achieved 66.7% success versus 63.7% for direct planning, but at significantly higher computational cost.

Mar 9, 202675% relevant

AI Revolutionizes Home Design: How Drafted Transforms Months of Planning Into Hours

Drafted, an AI-powered home design system, is transforming residential architecture by condensing months of early-stage planning into hours. The platform integrates local building regulations and practical constraints to create feasible designs from the start, serving architects, homebuyers, and builders simultaneously.

Mar 7, 202685% relevant

Wikipedia Navigation Challenge Exposes Critical Gaps in AI Planning Abilities

Researchers introduce LLM-WikiRace, a benchmark testing how well AI models navigate Wikipedia links between concepts. While top models like Gemini-3 show superhuman performance on easy tasks, success rates plummet to just 23% on hard challenges, revealing fundamental limitations in long-term planning.

Feb 20, 202670% relevant

Claude AI Adds Meal Planning Feature, Aims at Nutritionist Market

Anthropic's Claude AI assistant has been updated to create detailed weekly meal plans tailored to user-defined nutrition targets. This feature expansion moves Claude into the health and wellness productivity space, competing with specialized apps.

Apr 19, 202685% relevant

How Spec-Driven Development with Claude Code Cuts Planning Time by 80%

A developer's workflow for using detailed spec files as the single source of truth for Claude Code, enabling precise, autonomous feature generation.

Apr 9, 2026100% relevant

China Launches Decentralized AI Push for K-12 Grading, Lesson Planning

China is directing its K-12 schools to implement commercial AI systems for teacher assistance, grading, and student monitoring. This creates a large-scale, decentralized national project with minimal central funding.

Apr 6, 202697% relevant

Claude Code's /ultraplan Command Offloads Complex Planning to the Cloud

Ultraplan is a new research preview feature that generates complex coding plans remotely, allowing for targeted feedback and flexible execution either on the web or back in your terminal.

Apr 4, 2026100% relevant

New RL-Guided Planning Framework Boosts Warehouse Robot Throughput

Researchers propose RL-RH-PP, a hybrid AI framework combining reinforcement learning with classical search for lifelong multi-agent path finding. It dynamically assigns robot priorities to reduce congestion, achieving higher throughput in simulations and generalizing across layouts.

Mar 26, 202695% relevant

ServiceNow Research Launches EnterpriseOps-Gym: A 512-Tool Benchmark for Testing Agentic Planning in Enterprise Environments

ServiceNow Research and Mila have released EnterpriseOps-Gym, a high-fidelity benchmark with 164 database tables and 512 tools across eight domains to evaluate LLM agents on long-horizon enterprise workflows.

Mar 18, 202695% relevant

PseudoAct: How Pseudocode Planning Could Revolutionize AI Agent Decision-Making

Researchers have developed PseudoAct, a new framework that enables AI agents to plan complex tasks using pseudocode before execution. This approach addresses critical limitations in current reactive systems, reducing redundant actions and improving efficiency in long-horizon tasks by up to 20.93%.

Mar 2, 202675% relevant

GDPval Benchmark Reveals AI's Professional Competence: A New Tool for Economic Planning

A new interactive demonstration using OpenAI's GDPval benchmark shows current AI capabilities across economically valuable professional tasks. The project aims to make AI's real-world impact tangible for policymakers and civil society organizations, bridging the gap between technical assessments and practical economic decisions.

Feb 20, 202675% relevant

OpenAI Reportedly Planning Premium ChatGPT Tiers with Higher Rate Limits

OpenAI appears to be preparing new premium ChatGPT subscription tiers priced at $100 and $200 per month, offering 5x and 20x higher usage rates respectively. This move signals a strategic shift toward serving power users and enterprise customers who require more intensive AI interactions.

Mar 11, 202685% relevant

Schnucks and VitalityIP Launch Agentic Commerce Shopping Assistant Powered

Schnuck Markets and VitalityIP launched the first agentic commerce shopping assistant in grocery, powered by Google Cloud. It autonomously handles multi-step tasks like reordering and meal planning, moving beyond simple chatbots.

Jul 22, 202698% relevant

UK Grants Data Centers 'National Importance' Status, Overriding Local Regs

UK allows data centers 'national importance' status, overriding local planning rules to speed construction and attract investment.

Jul 7, 202682% relevant

DonnyClaude: A Verified Workflow Engine That Makes Claude Code Actually

DonnyClaude adds a durable planning layer and deterministic verification gates to Claude Code so the model can't mark work done until tests and checks pass. Install with npx donnyclaude.

Jul 6, 202698% relevant

Muxer: Open-Source Model Multiplexer Slashes Claude Code Costs by Routing

Muxer reduces Claude Code costs by multiplexing models per subtask via agent frontmatter and session hooks. Keep Fable/Opus for planning; route boilerplate to Haiku.

Jul 2, 202670% relevant

PlanBench-XL: GPT-5.4 Scores 11.36% on Hard Tool-Use Tasks

PlanBench-XL shows GPT-5.4 drops from 51.90% to 11.36% accuracy on long-horizon tool-use tasks with 1,665 tools, revealing a fundamental planning weakness.

Jun 28, 202690% relevant

Qwen-Image-Agent: Alibaba's Agentic Framework for Context-Aware Image Gen

Alibaba's Qwen-Image-Agent uses planning, reasoning, search, and memory to build context for text-to-image models, bridging the context gap in real-world generation.

Jun 26, 202687% relevant

Hybrid A*+RL Agent Beats Pure End-to-End in Unity SR-71 Sim

A hybrid A* + deep RL agent in Unity, trained over 5M PPO steps, switches between classical path planning and learned evasion to navigate an SR-71 through a maze while dodging missiles.

May 16, 202684% relevant

Voyagier Launches AI Trip Planner for Luxury Travel Booking

Voyagier launched AI trip planning for luxury travel, combining generative AI itineraries with human concierges for bookings.

May 11, 202692% relevant

LLMs Fail at Implicit Travel Constraints, New Benchmark Shows

LLMs fail at implicit travel constraints, a new arXiv paper decomposes planning into 5 atomic skills, finding structural biases and ineffective self-correction.

May 7, 202664% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety