agents

30 articles about agents in AI news

OSWorld 2.0 Launches, Tests AI Agents on 1,500 Desktop Tasks

Epoch AI released OSWorld 2.0 with 1,500 desktop tasks, up from 369 in v1, testing AI agents on adversarial and cross-application workflows.

Jun 27, 202695% relevant

OPID: Agents Learn From Hindsight Without External Memory

OPID lets agents learn hierarchical skills from hindsight, improving sample efficiency on ALFWorld, WebShop, Search QA without external memory at inference.

Jun 26, 202682% relevant

MoEngage Buys Aampe for Tens of Millions, Bets AI Agents Replace Campaigns

MoEngage acquired Aampe for tens of millions to embed per-customer AI agents, targeting migrations from Salesforce and Adobe Marketing Cloud.

Jun 23, 202671% relevant

x402 Lets Agents Pay Per API Call — No Wallet Required

x402 lets AI agents pay per API call via USDC micropayments. The Stall now adds a fiat rail: buy credits with a card from $5/100 calls.

Jun 21, 202665% relevant

UnitedHealth Bets $3B on AI Agents to Fix the Denial Machine It Built

UnitedHealth Group committed $3 billion to AI agents that call doctors, read charts to nurses, and process claims — a bet that the insurer that drew fury over algorithmic denials can use the same class of technology to restore trust. Under new CEO Stephen Hemsley, the company targets a 30% cut in pr

Jun 19, 202692% relevant

Movable Ink Launches Programmatic CRM With AI Agents for Personalized

Movable Ink launched Programmatic CRM with AI agents on June 18, 2026, automating personalized content creation and customer engagement for brands. The platform leverages real-time data to generate tailored content across email, web, and mobile, reducing manual effort while scaling personalization.

Jun 16, 202698% relevant

Stop Writing SDK Docs for AI Agents: Build MCP Servers Instead

MCP servers replace SDKs for AI agents. Claude Code users should expose APIs as MCP servers so agents discover capabilities autonomously, not via docs. First sentence: BridgeXAPI argues MCP servers transform messaging APIs into discoverable execution infrastructure for Claude Code agents.

Jun 15, 202695% relevant

MCP Agents Log 'Success: True' While Tasks Go Nowhere — Protocol Bug

MCP returns null results inside HTTP 200 responses, causing agents to log success while tasks never run. Vouqis proxy catches this with structured audit logs.

Jun 15, 202695% relevant

NVIDIA Blackwell Ultra Leads First Agentic AI Benchmark, 20x Agents/MW vs Hopper

NVIDIA Blackwell Ultra NVL72 leads the first AgentPerf benchmark for agentic AI, delivering 20x more agents per megawatt than Hopper.

Jun 12, 202692% relevant

SMAC-Talk: StarCraft Benchmark Tests LLM Agents Against Deceptive Allies

SMAC-Talk extends StarCraft Multi-Agent Challenge with natural language communication, testing LLM agents against deceptive allies. Qwen3.5 models benchmarked; no model exceeds 72% win rate.

Jun 5, 202670% relevant

DeepMind paper: hidden web content hijacks agents 86% of the time

DeepMind catalogues 6 attack types where hidden web content hijacks AI agents up to 86% of the time, reframing safety from model alignment to environment trust.

Jun 4, 2026100% relevant

Multi-Agent Systems Hit Diminishing Returns Past 4 Agents

Adding more agents to LLM-driven multi-agent systems degrades performance past a task-dependent optimum, with weaker models peaking at 4 agents and stronger ones at 2.

Jun 2, 2026100% relevant

Google Launches Free 5-Day AI Agents Course, 1.5M Enrolled Last Run

Google launched a free 5-day AI Agents course, following 1.5M learners in the prior edition. The curriculum covers vibe coding, multi-agent systems, and production deployment on Kaggle.

May 31, 202687% relevant

Anthropic Publishes Zero-Trust Architecture for AI Agents

Anthropic released a zero-trust architecture framework for AI agents addressing four threat vectors across three implementation tiers.

May 30, 202685% relevant

AgingBench: AI Agents Lose Reliability Over Time & Memory Fails

UT Austin paper finds AI agents degrade over time via memory errors. Proposes AgingBench to measure reliability decay across sessions.

May 28, 2026100% relevant

Microsoft RAMPART Brings Pytest-Based Safety Testing to AI Agents

Microsoft's RAMPART brings pytest-native safety testing to AI agents, covering adversarial attacks and benign failures, addressing a critical gap in agent development.

May 27, 202689% relevant

Anthropic Sandboxing Agents by Capability Level

Anthropic sandboxes agents by capability level, limiting destructive actions as agents gain autonomy in Claude.

May 26, 202694% relevant

Compute Shortage to Split AI Market: Rich Get Agents, Poor Get Chatbots

Mollick warns compute shortage makes agents expensive while chatbots cheapen, splitting AI market by company resources.

May 21, 202675% relevant

NanoGPT-Bench: A New Eval for Coding Agents Doing AI Research

IntologyAI released NanoGPT-Bench, an internal eval for coding agents on an AI R&D problem. No results or task specifics have been disclosed.

May 19, 202685% relevant

Neo4j's agent-memory: Open-source unified memory for AI agents via knowledge graphs

Neo4j releases agent-memory, an open-source unified memory layer for AI agents using knowledge graphs, enabling persistent structured recall.

May 19, 202675% relevant

Stanford AI Agents Outperform Human Hackers in Penetration Test

Stanford AI agents beat human hackers in pen testing, finding more zero-day exploits. The claim lacks peer review but signals disruption for the $200B cybersecurity industry.

May 18, 202685% relevant

AgentStop Cuts Local AI Agent Energy by 15-20% With Minimal Performance Loss

AgentStop cuts local AI agent energy by 15-20% with <5% utility loss using token log-probabilities.

May 18, 202685% relevant

The /goal Pattern Goes Mainstream — Agents Need Acceptance Criteria

The /goal pattern goes mainstream across coding agents. Effective goals require acceptance criteria-like conditions to avoid loops or hallucinated success.

May 15, 202683% relevant

Collider-Bench Tests LLM Agents on LHC Analysis Reproduction

Collider-Bench tests LLM agents on reproducing LHC analyses from papers. No agent beats physicist-in-the-loop, highlighting gaps in scientific reasoning.

May 15, 202692% relevant

Fake Done: Why AI Coding Agents Ship Incomplete Work

Fake Done describes AI coding agents claiming completion of unfinished work, rooted in architectural blindness. Deterministic verification outside the agent offers a fix.

May 12, 202684% relevant

Nokia Deploys Agentic AI Agents Across Fixed Network Platforms

Nokia launched agentic AI agents across its fixed network platforms to automate troubleshooting and accelerate fiber deployment by 25%.

May 12, 202685% relevant

Snapdragon X2 Elite Beats Intel Arrow Lake for AI Coding Agents

Snapdragon X2 Elite beat Intel Arrow Lake for Windows AI coding agents. CPU bottleneck, not inference speed, limited performance per @mweinbach.

May 11, 202692% relevant

AWS Builds First Payment API for Agentic AI — Agents Can Now Checkout

AWS launched first payment API for autonomous agents, enabling agent-initiated transactions. Closes critical gap for enterprise retail agentic AI workflows.

May 7, 202688% relevant

OpenClaw-RL Trains AI Agents on Conversation Feedback Without Manual Labels

OpenClaw-RL trains AI agents on natural conversation feedback, removing manual labeling. Uses evaluative and directive signals for continuous learning.

May 6, 202685% relevant

Anthropic Ships 10 Finance AI Agents as IPO Race with OpenAI Heats Up

Anthropic released 10 finance AI agents with Moody's data connectors. The launch intensifies the IPO race with OpenAI, backed by a $1.5B private equity JV.

May 5, 202698% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety