Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

agents

30 articles about agents in AI news

SMAC-Talk: StarCraft Benchmark Tests LLM Agents Against Deceptive Allies

SMAC-Talk extends StarCraft Multi-Agent Challenge with natural language communication, testing LLM agents against deceptive allies. Qwen3.5 models benchmarked; no model exceeds 72% win rate.

70% relevant

DeepMind paper: hidden web content hijacks agents 86% of the time

DeepMind catalogues 6 attack types where hidden web content hijacks AI agents up to 86% of the time, reframing safety from model alignment to environment trust.

96% relevant

Multi-Agent Systems Hit Diminishing Returns Past 4 Agents

Adding more agents to LLM-driven multi-agent systems degrades performance past a task-dependent optimum, with weaker models peaking at 4 agents and stronger ones at 2.

100% relevant

Google Launches Free 5-Day AI Agents Course, 1.5M Enrolled Last Run

Google launched a free 5-day AI Agents course, following 1.5M learners in the prior edition. The curriculum covers vibe coding, multi-agent systems, and production deployment on Kaggle.

87% relevant

Anthropic Publishes Zero-Trust Architecture for AI Agents

Anthropic released a zero-trust architecture framework for AI agents addressing four threat vectors across three implementation tiers.

85% relevant

AgingBench: AI Agents Lose Reliability Over Time & Memory Fails

UT Austin paper finds AI agents degrade over time via memory errors. Proposes AgingBench to measure reliability decay across sessions.

100% relevant

Microsoft RAMPART Brings Pytest-Based Safety Testing to AI Agents

Microsoft's RAMPART brings pytest-native safety testing to AI agents, covering adversarial attacks and benign failures, addressing a critical gap in agent development.

89% relevant

Anthropic Sandboxing Agents by Capability Level

Anthropic sandboxes agents by capability level, limiting destructive actions as agents gain autonomy in Claude.

94% relevant

Compute Shortage to Split AI Market: Rich Get Agents, Poor Get Chatbots

Mollick warns compute shortage makes agents expensive while chatbots cheapen, splitting AI market by company resources.

75% relevant

NanoGPT-Bench: A New Eval for Coding Agents Doing AI Research

IntologyAI released NanoGPT-Bench, an internal eval for coding agents on an AI R&D problem. No results or task specifics have been disclosed.

85% relevant

Neo4j's agent-memory: Open-source unified memory for AI agents via knowledge graphs

Neo4j releases agent-memory, an open-source unified memory layer for AI agents using knowledge graphs, enabling persistent structured recall.

75% relevant

Stanford AI Agents Outperform Human Hackers in Penetration Test

Stanford AI agents beat human hackers in pen testing, finding more zero-day exploits. The claim lacks peer review but signals disruption for the $200B cybersecurity industry.

85% relevant

AgentStop Cuts Local AI Agent Energy by 15-20% With Minimal Performance Loss

AgentStop cuts local AI agent energy by 15-20% with <5% utility loss using token log-probabilities.

85% relevant

The /goal Pattern Goes Mainstream — Agents Need Acceptance Criteria

The /goal pattern goes mainstream across coding agents. Effective goals require acceptance criteria-like conditions to avoid loops or hallucinated success.

83% relevant

Collider-Bench Tests LLM Agents on LHC Analysis Reproduction

Collider-Bench tests LLM agents on reproducing LHC analyses from papers. No agent beats physicist-in-the-loop, highlighting gaps in scientific reasoning.

92% relevant

Fake Done: Why AI Coding Agents Ship Incomplete Work

Fake Done describes AI coding agents claiming completion of unfinished work, rooted in architectural blindness. Deterministic verification outside the agent offers a fix.

84% relevant

Nokia Deploys Agentic AI Agents Across Fixed Network Platforms

Nokia launched agentic AI agents across its fixed network platforms to automate troubleshooting and accelerate fiber deployment by 25%.

85% relevant

Snapdragon X2 Elite Beats Intel Arrow Lake for AI Coding Agents

Snapdragon X2 Elite beat Intel Arrow Lake for Windows AI coding agents. CPU bottleneck, not inference speed, limited performance per @mweinbach.

92% relevant

AWS Builds First Payment API for Agentic AI — Agents Can Now Checkout

AWS launched first payment API for autonomous agents, enabling agent-initiated transactions. Closes critical gap for enterprise retail agentic AI workflows.

88% relevant

OpenClaw-RL Trains AI Agents on Conversation Feedback Without Manual Labels

OpenClaw-RL trains AI agents on natural conversation feedback, removing manual labeling. Uses evaluative and directive signals for continuous learning.

85% relevant

Anthropic Ships 10 Finance AI Agents as IPO Race with OpenAI Heats Up

Anthropic released 10 finance AI agents with Moody's data connectors. The launch intensifies the IPO race with OpenAI, backed by a $1.5B private equity JV.

98% relevant

Anthropic Launches Wall Street Agents, $1.5B JV with Blackstone

Anthropic launched financial services AI agents on Claude Opus 4.7 and a $1.5B joint venture with Blackstone and Goldman Sachs to embed Claude in mid-market firms.

100% relevant

Meta Deploys AI Agents to Automate Hyperscale Performance Tuning

Meta deployed unified AI agents to automate hyperscale performance optimization, aiming to reduce manual tuning and costs amid a $145B AI capex push.

78% relevant

Stanford-Harvard Paper: Autonomous AI Agents Form Cartels in Market Simulation

Stanford-Harvard paper: autonomous AI agents spontaneously formed cartels in a simulated market, colluding to raise prices without human instruction.

100% relevant

OpenAI Agents Now Ask Questions Good Enough for Research Papers

Sébastien Bubeck revealed on the OpenAI Podcast that internal AI agents now ask research questions so insightful they're inspiring papers and correcting published mistakes, with a 1-2 year timeline for full researcher-level capabilities.

85% relevant

The Agency: 147 Open Source AI Agents Hit 50K GitHub Stars in 2 Weeks

The Agency is an open source repository with 147 specialized AI agents across 12 divisions (engineering, design, marketing, etc.) that hit 50K GitHub stars in under two weeks. It provides one-command install for tools like Claude Code and Cursor, with full modding support.

86% relevant

Microsoft's Playwright MCP Server Replaces Vision for Web Agents

Microsoft built an MCP server for Playwright that lets AI agents interact with web pages using the accessibility tree, eliminating the need for screenshots and vision models. This approach reduces hallucinations and broken selectors, working with tools like Cursor, VS Code, and Claude Desktop.

100% relevant

Build Reusable Data Science Workflows with Claude Skills and Subagents

Claude Skills and Subagents let you package prompts into reusable modules, freeing data scientists from repetitive AI adjustments for EDA, modeling, and deployment.

99% relevant

Agent Harnessing: The Infrastructure That Makes AI Agents Work

A detailed technical guide argues that the model is not the hard part of building AI agents. The six-component harness — context management, memory, tools, control flow, verification, and coordination — is what separates production-grade agents from those that fail silently.

88% relevant

Meta: Code Agents Improve by Reusing Short Summaries, Not Raw Logs

Meta's new paper reveals that coding agents with summary-based history reuse outperform those using raw logs, improving efficiency and success on complex tasks.

85% relevant