Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

agents

30 articles about agents in AI news

NanoGPT-Bench: A New Eval for Coding Agents Doing AI Research

IntologyAI released NanoGPT-Bench, an internal eval for coding agents on an AI R&D problem. No results or task specifics have been disclosed.

85% relevant

Neo4j's agent-memory: Open-source unified memory for AI agents via knowledge graphs

Neo4j releases agent-memory, an open-source unified memory layer for AI agents using knowledge graphs, enabling persistent structured recall.

75% relevant

Stanford AI Agents Outperform Human Hackers in Penetration Test

Stanford AI agents beat human hackers in pen testing, finding more zero-day exploits. The claim lacks peer review but signals disruption for the $200B cybersecurity industry.

85% relevant

AgentStop Cuts Local AI Agent Energy by 15-20% With Minimal Performance Loss

AgentStop cuts local AI agent energy by 15-20% with <5% utility loss using token log-probabilities.

85% relevant

The /goal Pattern Goes Mainstream — Agents Need Acceptance Criteria

The /goal pattern goes mainstream across coding agents. Effective goals require acceptance criteria-like conditions to avoid loops or hallucinated success.

83% relevant

Collider-Bench Tests LLM Agents on LHC Analysis Reproduction

Collider-Bench tests LLM agents on reproducing LHC analyses from papers. No agent beats physicist-in-the-loop, highlighting gaps in scientific reasoning.

92% relevant

Fake Done: Why AI Coding Agents Ship Incomplete Work

Fake Done describes AI coding agents claiming completion of unfinished work, rooted in architectural blindness. Deterministic verification outside the agent offers a fix.

84% relevant

Nokia Deploys Agentic AI Agents Across Fixed Network Platforms

Nokia launched agentic AI agents across its fixed network platforms to automate troubleshooting and accelerate fiber deployment by 25%.

85% relevant

Snapdragon X2 Elite Beats Intel Arrow Lake for AI Coding Agents

Snapdragon X2 Elite beat Intel Arrow Lake for Windows AI coding agents. CPU bottleneck, not inference speed, limited performance per @mweinbach.

92% relevant

AWS Builds First Payment API for Agentic AI — Agents Can Now Checkout

AWS launched first payment API for autonomous agents, enabling agent-initiated transactions. Closes critical gap for enterprise retail agentic AI workflows.

88% relevant

OpenClaw-RL Trains AI Agents on Conversation Feedback Without Manual Labels

OpenClaw-RL trains AI agents on natural conversation feedback, removing manual labeling. Uses evaluative and directive signals for continuous learning.

85% relevant

Anthropic Ships 10 Finance AI Agents as IPO Race with OpenAI Heats Up

Anthropic released 10 finance AI agents with Moody's data connectors. The launch intensifies the IPO race with OpenAI, backed by a $1.5B private equity JV.

98% relevant

Anthropic Launches Wall Street Agents, $1.5B JV with Blackstone

Anthropic launched financial services AI agents on Claude Opus 4.7 and a $1.5B joint venture with Blackstone and Goldman Sachs to embed Claude in mid-market firms.

100% relevant

Meta Deploys AI Agents to Automate Hyperscale Performance Tuning

Meta deployed unified AI agents to automate hyperscale performance optimization, aiming to reduce manual tuning and costs amid a $145B AI capex push.

78% relevant

Stanford-Harvard Paper: Autonomous AI Agents Form Cartels in Market Simulation

Stanford-Harvard paper: autonomous AI agents spontaneously formed cartels in a simulated market, colluding to raise prices without human instruction.

100% relevant

OpenAI Agents Now Ask Questions Good Enough for Research Papers

Sébastien Bubeck revealed on the OpenAI Podcast that internal AI agents now ask research questions so insightful they're inspiring papers and correcting published mistakes, with a 1-2 year timeline for full researcher-level capabilities.

85% relevant

The Agency: 147 Open Source AI Agents Hit 50K GitHub Stars in 2 Weeks

The Agency is an open source repository with 147 specialized AI agents across 12 divisions (engineering, design, marketing, etc.) that hit 50K GitHub stars in under two weeks. It provides one-command install for tools like Claude Code and Cursor, with full modding support.

86% relevant

Microsoft's Playwright MCP Server Replaces Vision for Web Agents

Microsoft built an MCP server for Playwright that lets AI agents interact with web pages using the accessibility tree, eliminating the need for screenshots and vision models. This approach reduces hallucinations and broken selectors, working with tools like Cursor, VS Code, and Claude Desktop.

100% relevant

Build Reusable Data Science Workflows with Claude Skills and Subagents

Claude Skills and Subagents let you package prompts into reusable modules, freeing data scientists from repetitive AI adjustments for EDA, modeling, and deployment.

99% relevant

Agent Harnessing: The Infrastructure That Makes AI Agents Work

A detailed technical guide argues that the model is not the hard part of building AI agents. The six-component harness — context management, memory, tools, control flow, verification, and coordination — is what separates production-grade agents from those that fail silently.

88% relevant

Meta: Code Agents Improve by Reusing Short Summaries, Not Raw Logs

Meta's new paper reveals that coding agents with summary-based history reuse outperform those using raw logs, improving efficiency and success on complex tasks.

85% relevant

OpenAI Launches GPT-5.5: Smarter Agents, Deeper Tool Use

OpenAI unveiled GPT-5.5, positioned as a new intelligence tier designed for real-world work and autonomous agents, with enhanced tool-use capabilities and complex goal understanding.

97% relevant

Agentic storefronts: How AI agents are reshaping the shopping journey from

Major tech companies integrate AI agents into search and checkout; platforms like ChatGPT become primary shopping discovery channels. Agentic storefronts (e.g., Swap) guide shoppers end-to-end, getting smarter per session.

86% relevant

Why Zed's Parallel Agents Won't Fix Your Real Bottleneck (And What Will)

Zed's parallel agents cut refactoring time 60% on independent modules but introduced conflicts on shared dependencies. The bottleneck isn't speed — it's context window limits.

90% relevant

OpenAI Launches ChatGPT Workspace Agents for Team Automation

OpenAI has introduced workspace agents within ChatGPT, powered by Codex, designed to automate complex, multi-step workflows for teams across shared environments like Slack. These agents can gather context, execute tasks, request approvals, and run continuously in the cloud.

97% relevant

Google's Design.md Gives AI Coding Agents a Visual Design Memory

Google introduced Design.md, a file format for storing design tokens and rules that AI coding agents can read to maintain visual consistency, addressing a key failure point in automated UI generation.

95% relevant

AI Agents Now Training Other AI Models, Sparking Autoresearch Trend

AI agents are now being used to train other AI models, creating advanced agentic systems. This development stems from Andrej Karpathy's autoresearch repository and represents early-stage automation of AI research.

75% relevant

Swiss AI Lab Ships Pixel-Based Agents That Control Real Phones

A Swiss AI lab has developed agents that interact with smartphones by processing screen pixels and simulating touch, eliminating the need for app-specific APIs or integrations. This approach mirrors human interaction and could generalize across any app interface.

93% relevant

AI Agents Show Consistent Economic Analysis, Reducing Human Disagreement

A new study finds AI agents like Claude Code and Codex produce economic analyses with far less disagreement than human teams, landing near the human median but with no extreme outliers. This indicates AI's potential for scalable, consistent research support.

85% relevant

Research Paper Proposes Security Framework for Autonomous AI Agents in Commerce

A Systematization of Knowledge (SoK) paper analyzes the emerging threat landscape for autonomous LLM agents conducting commerce. It identifies 12 attack vectors across five dimensions and proposes a layered defense architecture. This is a foundational security analysis for a nascent but high-stakes technology.

100% relevant