agents
30 articles about agents in AI news
OSWorld 2.0 Launches, Tests AI Agents on 1,500 Desktop Tasks
Epoch AI released OSWorld 2.0 with 1,500 desktop tasks, up from 369 in v1, testing AI agents on adversarial and cross-application workflows.
OPID: Agents Learn From Hindsight Without External Memory
OPID lets agents learn hierarchical skills from hindsight, improving sample efficiency on ALFWorld, WebShop, Search QA without external memory at inference.
MoEngage Buys Aampe for Tens of Millions, Bets AI Agents Replace Campaigns
MoEngage acquired Aampe for tens of millions to embed per-customer AI agents, targeting migrations from Salesforce and Adobe Marketing Cloud.
x402 Lets Agents Pay Per API Call — No Wallet Required
x402 lets AI agents pay per API call via USDC micropayments. The Stall now adds a fiat rail: buy credits with a card from $5/100 calls.
UnitedHealth Bets $3B on AI Agents to Fix the Denial Machine It Built
UnitedHealth Group committed $3 billion to AI agents that call doctors, read charts to nurses, and process claims — a bet that the insurer that drew fury over algorithmic denials can use the same class of technology to restore trust. Under new CEO Stephen Hemsley, the company targets a 30% cut in pr
Movable Ink Launches Programmatic CRM With AI Agents for Personalized
Movable Ink launched Programmatic CRM with AI agents on June 18, 2026, automating personalized content creation and customer engagement for brands. The platform leverages real-time data to generate tailored content across email, web, and mobile, reducing manual effort while scaling personalization.
Stop Writing SDK Docs for AI Agents: Build MCP Servers Instead
MCP servers replace SDKs for AI agents. Claude Code users should expose APIs as MCP servers so agents discover capabilities autonomously, not via docs. First sentence: BridgeXAPI argues MCP servers transform messaging APIs into discoverable execution infrastructure for Claude Code agents.
MCP Agents Log 'Success: True' While Tasks Go Nowhere — Protocol Bug
MCP returns null results inside HTTP 200 responses, causing agents to log success while tasks never run. Vouqis proxy catches this with structured audit logs.
NVIDIA Blackwell Ultra Leads First Agentic AI Benchmark, 20x Agents/MW vs Hopper
NVIDIA Blackwell Ultra NVL72 leads the first AgentPerf benchmark for agentic AI, delivering 20x more agents per megawatt than Hopper.
SMAC-Talk: StarCraft Benchmark Tests LLM Agents Against Deceptive Allies
SMAC-Talk extends StarCraft Multi-Agent Challenge with natural language communication, testing LLM agents against deceptive allies. Qwen3.5 models benchmarked; no model exceeds 72% win rate.
DeepMind paper: hidden web content hijacks agents 86% of the time
DeepMind catalogues 6 attack types where hidden web content hijacks AI agents up to 86% of the time, reframing safety from model alignment to environment trust.
Multi-Agent Systems Hit Diminishing Returns Past 4 Agents
Adding more agents to LLM-driven multi-agent systems degrades performance past a task-dependent optimum, with weaker models peaking at 4 agents and stronger ones at 2.
Google Launches Free 5-Day AI Agents Course, 1.5M Enrolled Last Run
Google launched a free 5-day AI Agents course, following 1.5M learners in the prior edition. The curriculum covers vibe coding, multi-agent systems, and production deployment on Kaggle.
Anthropic Publishes Zero-Trust Architecture for AI Agents
Anthropic released a zero-trust architecture framework for AI agents addressing four threat vectors across three implementation tiers.
AgingBench: AI Agents Lose Reliability Over Time & Memory Fails
UT Austin paper finds AI agents degrade over time via memory errors. Proposes AgingBench to measure reliability decay across sessions.
Microsoft RAMPART Brings Pytest-Based Safety Testing to AI Agents
Microsoft's RAMPART brings pytest-native safety testing to AI agents, covering adversarial attacks and benign failures, addressing a critical gap in agent development.
Anthropic Sandboxing Agents by Capability Level
Anthropic sandboxes agents by capability level, limiting destructive actions as agents gain autonomy in Claude.
Compute Shortage to Split AI Market: Rich Get Agents, Poor Get Chatbots
Mollick warns compute shortage makes agents expensive while chatbots cheapen, splitting AI market by company resources.
NanoGPT-Bench: A New Eval for Coding Agents Doing AI Research
IntologyAI released NanoGPT-Bench, an internal eval for coding agents on an AI R&D problem. No results or task specifics have been disclosed.
Neo4j's agent-memory: Open-source unified memory for AI agents via knowledge graphs
Neo4j releases agent-memory, an open-source unified memory layer for AI agents using knowledge graphs, enabling persistent structured recall.
Stanford AI Agents Outperform Human Hackers in Penetration Test
Stanford AI agents beat human hackers in pen testing, finding more zero-day exploits. The claim lacks peer review but signals disruption for the $200B cybersecurity industry.
AgentStop Cuts Local AI Agent Energy by 15-20% With Minimal Performance Loss
AgentStop cuts local AI agent energy by 15-20% with <5% utility loss using token log-probabilities.
The /goal Pattern Goes Mainstream — Agents Need Acceptance Criteria
The /goal pattern goes mainstream across coding agents. Effective goals require acceptance criteria-like conditions to avoid loops or hallucinated success.
Collider-Bench Tests LLM Agents on LHC Analysis Reproduction
Collider-Bench tests LLM agents on reproducing LHC analyses from papers. No agent beats physicist-in-the-loop, highlighting gaps in scientific reasoning.
Fake Done: Why AI Coding Agents Ship Incomplete Work
Fake Done describes AI coding agents claiming completion of unfinished work, rooted in architectural blindness. Deterministic verification outside the agent offers a fix.
Nokia Deploys Agentic AI Agents Across Fixed Network Platforms
Nokia launched agentic AI agents across its fixed network platforms to automate troubleshooting and accelerate fiber deployment by 25%.
Snapdragon X2 Elite Beats Intel Arrow Lake for AI Coding Agents
Snapdragon X2 Elite beat Intel Arrow Lake for Windows AI coding agents. CPU bottleneck, not inference speed, limited performance per @mweinbach.
AWS Builds First Payment API for Agentic AI — Agents Can Now Checkout
AWS launched first payment API for autonomous agents, enabling agent-initiated transactions. Closes critical gap for enterprise retail agentic AI workflows.
OpenClaw-RL Trains AI Agents on Conversation Feedback Without Manual Labels
OpenClaw-RL trains AI agents on natural conversation feedback, removing manual labeling. Uses evaluative and directive signals for continuous learning.
Anthropic Ships 10 Finance AI Agents as IPO Race with OpenAI Heats Up
Anthropic released 10 finance AI agents with Moody's data connectors. The launch intensifies the IPO race with OpenAI, backed by a $1.5B private equity JV.