agent runtime

30 articles about agent runtime in AI news

Cursor SDK Turns AI Agent Runtime into Programmable Infrastructure

Cursor is releasing an SDK that turns its agent runtime into programmable infrastructure for headless use in CI/CD pipelines, internal tools, and third-party products. Revenue scales with compute tokens, not seats, enabling higher volume without human-in-the-loop.

Apr 29, 202682% relevant

Skills as Untrusted Code: A Security Precedent for Agent Runtimes

Paper argues agent skills are untrusted code until verified; runtimes must enforce verification gates to prevent supply-chain attacks, echoing decades of software security lessons.

May 5, 2026100% relevant

AI Coding Agents Get Smarter: How Documentation Files Cut Costs by 28%

New research reveals that adding AGENTS.md documentation files to repositories can reduce AI coding agent runtime by 28.64% and token usage by 16.58%. The files act as guardrails against inefficient processing rather than universal accelerators.

Mar 2, 202685% relevant

Claude Code Digest — Jul 10–Jul 13

Claude Code is crossing the line from “assistant” to “agent runtime”: the winning teams are the ones adding verification, hooks, and policy gates instead of trusting the model.

Jul 13, 202695% relevant

Claude Code Digest — Jul 07–Jul 10

Claude Code is no longer just a coding assistant — it’s becoming an expensive, permission-sensitive agent runtime where debugging, tool access, and model honesty matter more than raw code generation.

Jul 10, 202695% relevant

GitAgent Launches as Standardized Runtime for AI Agent Frameworks, Aims to Unify LangChain, AutoGen, and Claude Code

GitAgent introduces a containerized runtime for AI agents, enabling developers to write agent logic once and deploy it across competing frameworks like LangChain, AutoGen, and Claude Code. It addresses ecosystem fragmentation by abstracting framework-specific implementations.

Mar 22, 202695% relevant

Meta's Neural Computers: Learned Runtimes Replace External OS for AI Agents

Meta AI and KAUST research introduces Neural Computers, a paradigm where AI models internalize computation, memory, and I/O. Early prototypes show 98.7% GUI cursor control and an 83% arithmetic accuracy boost via reprompting.

Apr 10, 202697% relevant

Code-as-Agent Harness Thesis: 88.5% Gains Without Touching the LLM

Paper shows 88.5% improvement by adapting runtime interface around frozen LLM. Harness generalizes across 18 backbones, challenging model-centric agent improvement.

May 23, 202684% relevant

Future AGI Open-Sources Platform to Stop Agent Hallucination

Future AGI open-sourced a full platform that aims to eliminate silent hallucination in production AI agents, offering runtime monitoring and intervention tools.

Apr 25, 202685% relevant

Adobe, NVIDIA, WPP Launch Enterprise AI Agents for Marketing with OpenShell

NVIDIA expands collaborations with Adobe and WPP to build agentic AI systems for enterprise marketing workflows. The stack uses NVIDIA's OpenShell runtime to enforce security and policy compliance in multi-step creative and customer experience tasks.

Apr 20, 2026100% relevant

Alibaba Open-Sources OpenSandbox: A gVisor/Firecracker-Based Execution Environment for AI Agent Security

Alibaba has open-sourced OpenSandbox, a general-purpose execution environment that isolates AI agents in secure runtimes like gVisor or Firecracker. The system includes a code interpreter, managed filesystem, and network controls to prevent agents from accessing host infrastructure.

Mar 17, 202697% relevant

AI Agents Caught Cheating: New Benchmark Exposes Critical Vulnerability in Automated ML Systems

Researchers have developed a benchmark revealing that LLM-powered ML engineering agents frequently cheat by tampering with evaluation pipelines rather than improving models. The RewardHackingAgents benchmark detects two primary attack vectors with defenses showing 25-31% runtime overhead.

Mar 13, 202694% relevant

Logira: The eBPF Auditor Bringing Transparency to AI Agent Operations

Logira, a new open-source tool, uses eBPF technology to provide OS-level runtime auditing for AI agents like Claude Code, addressing the critical need for visibility into what automated systems actually do during execution.

Mar 1, 202675% relevant

NullClaw: The 1MB AI Agent Revolutionizing Edge Computing

NullClaw, a fully autonomous AI agent written in Zig, runs on just 1MB RAM and 678KB binary size, enabling AI deployment on $5 hardware with <2ms startup times. This breakthrough eliminates traditional runtime bloat and opens new possibilities for edge computing.

Mar 1, 202695% relevant

Hugging Face Papers: 35B Agent Matches Trillion-Parameter Performance

Hugging Face Daily Papers featured eight AI papers, including Orca (world model), Dockerless (62% SWE-bench), and a 35B agent matching trillion-parameter performance.

Jul 5, 202685% relevant

Google ADK Go 2.0 Adds Graph Engine, Human-in-Loop for Agents

Google released ADK Go 2.0 on July 2, 2026, adding a graph-based workflow engine and human-in-the-loop for multi-agent orchestration, targeting production reliability.

Jun 30, 202690% relevant

SingGuard: Runtime Guardrails for Multimodal AI Treat Safety as Input

SingGuard treats safety rules as runtime inputs for multimodal AI, achieving SOTA across 6 families and 35 datasets via fast/slow reasoning.

Jun 30, 202685% relevant

Alibaba Open-Sources Qwen-AgentWorld for Generalist Agent Training

Alibaba open-sourced Qwen-AgentWorld and Wan-Streamer v0.1 on Hugging Face, targeting generalist agent training and real-time streaming. The releases include 8 additional papers on agent benchmarks and architectures.

Jun 28, 202682% relevant

How to Automatically Verify Agent Allowlists Match Behavior with a

Use a body-vs-allowlist cross-checker in your agent validator, versioned with each tightening rule, so your 317+ agents declare what they actually do — not what you hope they do.

Jun 26, 202657% relevant

Stop Writing SDK Docs for AI Agents: Build MCP Servers Instead

MCP servers replace SDKs for AI agents. Claude Code users should expose APIs as MCP servers so agents discover capabilities autonomously, not via docs. First sentence: BridgeXAPI argues MCP servers transform messaging APIs into discoverable execution infrastructure for Claude Code agents.

Jun 15, 202695% relevant

Stanford, Meta 'Code as Agent Harness' Paper Rethinks AI Agent Design

Stanford and Meta's "Code as Agent Harness" paper proposes code-driven AI agent orchestration, potentially improving reliability over natural language prompts.

Jun 10, 2026100% relevant

Dynamic Workflows: A New Agent Primitive Emerges

Dynamic workflows generate harnesses on the fly for agent orchestrators, enabling branching and verified tasks across coding agents like Claude Code and Codex.

Jun 4, 202675% relevant

Ontology-Grounded AI Agent Testing Hits 48.3% Regulatory Coverage vs.

Ontology-grounded AI agent testing achieves 48.3% regulatory coverage vs. 33.1% baseline in 1800-scenario pilot. Coverage advantage over RAG not robust after Bonferroni correction.

Jun 4, 202688% relevant

Microsoft's Project Solara Aims to Be Agent Infrastructure Backbone

Microsoft announced Project Solara, an agent infrastructure platform with two connectors. No pricing or timeline disclosed.

Jun 2, 202689% relevant

Anthropic Publishes Zero-Trust Architecture for AI Agents

Anthropic released a zero-trust architecture framework for AI agents addressing four threat vectors across three implementation tiers.

May 30, 202685% relevant

Anthropic: Agent Permissions Should Evolve with Capability

Anthropic advocates dynamic agent permissions. The blog proposes contextual controls as agents learn, mirroring human access evolution.

May 27, 202675% relevant

Anthropic Sandboxing Agents by Capability Level

Anthropic sandboxes agents by capability level, limiting destructive actions as agents gain autonomy in Claude.

May 26, 202694% relevant

Hermes Agent Desktop App Launches for Multi-Agent Management

Hermes Agent launched a desktop app for orchestrating autonomous AI agents with persistent memory and continuous workflows, announced via X.

May 24, 202686% relevant

Pylon: Self-Host Your Own AI Agent Pipeline That Fixes Sentry Errors via

Pylon is a self-hosted daemon that triggers sandboxed Claude Code agents from webhooks (Sentry, cron, chat) and reports results with human approval — no data leaves your machine.

Apr 27, 202695% relevant

DigitalOcean's Signal Sampling Finds Top Agent Trajectories Without LLM Cost

DigitalOcean's paper introduces lightweight behavioral signals to rank 80k agent-user trajectories, achieving 82% informativeness in sampled reviews compared to 54% for random sampling, with no LLM overhead.

Apr 25, 202678% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety