multi turn agents
30 articles about multi turn agents in AI news
SDAR: Self-Distilled RL Stabilizes Multi-Turn LLM Agents, +9.4% on ALFWorld
SDAR gates self-distillation within GRPO to stabilize multi-turn LLM agent training, yielding +9.4% on ALFWorld and gains on WebShop and Search-QA across Qwen2.5 and Qwen3 models.
Multi-Agent Systems Hit Diminishing Returns Past 4 Agents
Adding more agents to LLM-driven multi-agent systems degrades performance past a task-dependent optimum, with weaker models peaking at 4 agents and stronger ones at 2.
Multi-User LLM Agents Struggle: Gemini 3 Pro Scores 85.6% on Muses-Bench
A new benchmark reveals LLMs struggle with multi-user scenarios where agents face conflicting instructions. Gemini 3 Pro leads but only achieves 85.6% average, with privacy-utility tradeoffs proving particularly difficult.
Forge: The Open-Source TUI That Turns Claude Code into a Multi-Model Swarm
Forge is a new open-source tool that orchestrates multiple AI coding agents (including Claude Code) using git-native isolation and semantic context management to overcome token limits.
New Research Paper Identifies Multi-Tool Coordination as Critical Failure Point for AI Agents
A new research paper posits that the primary failure mode for AI agents is not in calling individual tools, but in reliably coordinating sequences of many tools over extended tasks. This reframes the core challenge from single-step execution to multi-step orchestration and state management.
How RepoWire Turns Your Claude Code Sessions into a Multi-Agent Network
RepoWire orchestrates multiple Claude Code instances to work in parallel, letting you run specialized agents simultaneously for faster, more comprehensive development tasks.
Kelos: The Kubernetes Framework That's Turning AI Coding Agents Into Self-Developing Systems
Kelos introduces a Kubernetes-native framework for orchestrating autonomous AI coding agents through declarative YAML workflows. This approach transforms AI-assisted development from manual interactions to continuous, automated pipelines that can self-improve projects.
Shard: Run 4 Claude Code Agents in Parallel to Slash Task Times by 75%
Shard orchestrates multiple Claude Code agents to work on decomposed tasks simultaneously using git worktrees, turning 45-minute serial jobs into 12-minute parallel runs.
MCP Agents Log 'Success: True' While Tasks Go Nowhere — Protocol Bug
MCP returns null results inside HTTP 200 responses, causing agents to log success while tasks never run. Vouqis proxy catches this with structured audit logs.
MiniMax Launches MMX-CLI, First Infrastructure Built for AI Agents
MiniMax released MMX-CLI, a CLI built for AI agents, not humans. It provides agents with seven multimodal 'senses' and native integration with popular AI coding environments.
Claude Managed Agents: How to Build on the Platform Instead of in Its Gaps
Claude Managed Agents turns long-running, stateful agents into an API call. For developers, this means building durable applications on a stable platform, not temporary solutions in its gaps.
Stanford Paper: More AI Agents Can Reduce Performance, Not Improve It
A new Stanford paper shows that increasing the number of AI agents in a multi-agent system can lead to worse overall performance, contradicting the common 'more agents, better results' intuition. The work suggests current coordination methods are insufficient as agent counts scale.
LLM Multi-Agent Framework 'Shared Workspace' Proposed to Improve Complex Reasoning via Task Decomposition
A new research paper proposes a multi-agent framework where LLMs split complex reasoning tasks across specialized agents that collaborate via a shared workspace. This approach aims to overcome single-model limitations in planning and tool use.
PodcastBrain: A Technical Breakdown of a Multi-Agent AI System That Learns User Preferences
A developer built PodcastBrain, an open-source, local AI podcast generator where two distinct agents debate any topic. The system learns user preferences via ratings and adjusts future content, demonstrating a working feedback loop with multi-agent orchestration.
Research Paper 'Can AI Agents Agree?' Finds LLM-Based Groups Fail at Simple Coordination
A new study demonstrates that groups of LLM-based AI agents cannot reliably reach consensus on simple decisions, with failure rates increasing with group size. This challenges the common developer assumption that multi-agent systems will naturally converge through discussion.
LLMs Score Only 22% Win Rate in Multi-Agent Clue Game, Revealing Deductive Reasoning Gaps
Researchers created a text-based Clue game to test LLM agents' multi-step deductive reasoning. Across 18 games with GPT-4o-mini and Gemini-2.5-Flash agents, only 4 correct wins were achieved, showing fine-tuning on logic puzzles doesn't reliably improve performance.
Multi-Agent AI Systems: Architecture Patterns and Governance for Enterprise Deployment
A technical guide outlines four primary architecture patterns for multi-agent AI systems and proposes a three-layer governance framework. This provides a structured approach for enterprises scaling AI agents across complex operations.
New Research Proposes 'Level-2 Inverse Games' to Infer Agents' Conflicting Beliefs About Each Other
MIT researchers propose a 'level-2' inverse game theory framework to infer what each agent believes about other agents' objectives, addressing limitations of current methods that assume perfect knowledge. This has implications for modeling complex multi-agent interactions.
Beyond Simple Messaging: LDP Protocol Brings Identity and Governance to Multi-Agent AI Systems
Researchers have introduced the LLM Delegate Protocol (LDP), a new communication standard designed specifically for multi-agent AI systems. Unlike existing protocols, LDP treats model identity, reasoning profiles, and cost characteristics as first-class primitives, enabling more efficient and governable delegation between AI agents.
Jinn: Run Claude Code as a Multi-Agent Team with Cron Jobs and Slack Integration
Jinn is an open-source gateway daemon that turns Claude Code CLI into a multi-agent system with scheduling, Slack integration, and a web dashboard.
NVIDIA Blackwell Ultra Leads First Agentic AI Benchmark, 20x Agents/MW vs Hopper
NVIDIA Blackwell Ultra NVL72 leads the first AgentPerf benchmark for agentic AI, delivering 20x more agents per megawatt than Hopper.
Anthropic: AI agents fail biology retrieval, miss 261 Ebola sequences
Anthropic research shows Claude Sonnet 4 returning 5–106 Ebola sequences instead of 266, shifting outbreak origin from 2014 to 1922. Repeatable retrieval tool fixes the variance.
Compute Shortage to Split AI Market: Rich Get Agents, Poor Get Chatbots
Mollick warns compute shortage makes agents expensive while chatbots cheapen, splitting AI market by company resources.
CLAUDE.md Wastes 7K+ Tokens Per Turn; Skills Cut to 50
A 1,000-line CLAUDE.md burns 7,000-10,000 tokens per turn on instructions the model already knows. Skills using progressive disclosure cut that to ~50 tokens.
MNEMA: A Witness Lattice for Multi-Agent AI Memory
Today's agentic AI fails three ways: agents miscoordinate, memory gets quietly poisoned, and decisions can't be audited. A new EUMAS 2026 submission argues the fix is to stop treating memory as static records. Make it *living* — every memory unit becomes an autonomous cryptographic witness that interacts with other witnesses (agree, disagree, give birth to new witnesses, split, coalesce, retire), and decisions emerge from a fixed signed protocol rather than from a single orchestrator.
Cursor SDK Turns AI Agent Runtime into Programmable Infrastructure
Cursor is releasing an SDK that turns its agent runtime into programmable infrastructure for headless use in CI/CD pipelines, internal tools, and third-party products. Revenue scales with compute tokens, not seats, enabling higher volume without human-in-the-loop.
Build Reusable Data Science Workflows with Claude Skills and Subagents
Claude Skills and Subagents let you package prompts into reusable modules, freeing data scientists from repetitive AI adjustments for EDA, modeling, and deployment.
Agent Harnessing: The Infrastructure That Makes AI Agents Work
A detailed technical guide argues that the model is not the hard part of building AI agents. The six-component harness — context management, memory, tools, control flow, verification, and coordination — is what separates production-grade agents from those that fail silently.
OpenAI Launches GPT-5.5: Smarter Agents, Deeper Tool Use
OpenAI unveiled GPT-5.5, positioned as a new intelligence tier designed for real-world work and autonomous agents, with enhanced tool-use capabilities and complex goal understanding.
Why Zed's Parallel Agents Won't Fix Your Real Bottleneck (And What Will)
Zed's parallel agents cut refactoring time 60% on independent modules but introduced conflicts on shared dependencies. The bottleneck isn't speed — it's context window limits.