Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

agent runtime

30 articles about agent runtime in AI news

Cursor SDK Turns AI Agent Runtime into Programmable Infrastructure

Cursor is releasing an SDK that turns its agent runtime into programmable infrastructure for headless use in CI/CD pipelines, internal tools, and third-party products. Revenue scales with compute tokens, not seats, enabling higher volume without human-in-the-loop.

82% relevant

Skills as Untrusted Code: A Security Precedent for Agent Runtimes

Paper argues agent skills are untrusted code until verified; runtimes must enforce verification gates to prevent supply-chain attacks, echoing decades of software security lessons.

100% relevant

AI Coding Agents Get Smarter: How Documentation Files Cut Costs by 28%

New research reveals that adding AGENTS.md documentation files to repositories can reduce AI coding agent runtime by 28.64% and token usage by 16.58%. The files act as guardrails against inefficient processing rather than universal accelerators.

85% relevant

GitAgent Launches as Standardized Runtime for AI Agent Frameworks, Aims to Unify LangChain, AutoGen, and Claude Code

GitAgent introduces a containerized runtime for AI agents, enabling developers to write agent logic once and deploy it across competing frameworks like LangChain, AutoGen, and Claude Code. It addresses ecosystem fragmentation by abstracting framework-specific implementations.

95% relevant

Meta's Neural Computers: Learned Runtimes Replace External OS for AI Agents

Meta AI and KAUST research introduces Neural Computers, a paradigm where AI models internalize computation, memory, and I/O. Early prototypes show 98.7% GUI cursor control and an 83% arithmetic accuracy boost via reprompting.

97% relevant

Code-as-Agent Harness Thesis: 88.5% Gains Without Touching the LLM

Paper shows 88.5% improvement by adapting runtime interface around frozen LLM. Harness generalizes across 18 backbones, challenging model-centric agent improvement.

84% relevant

Future AGI Open-Sources Platform to Stop Agent Hallucination

Future AGI open-sourced a full platform that aims to eliminate silent hallucination in production AI agents, offering runtime monitoring and intervention tools.

85% relevant

Adobe, NVIDIA, WPP Launch Enterprise AI Agents for Marketing with OpenShell

NVIDIA expands collaborations with Adobe and WPP to build agentic AI systems for enterprise marketing workflows. The stack uses NVIDIA's OpenShell runtime to enforce security and policy compliance in multi-step creative and customer experience tasks.

100% relevant

Alibaba Open-Sources OpenSandbox: A gVisor/Firecracker-Based Execution Environment for AI Agent Security

Alibaba has open-sourced OpenSandbox, a general-purpose execution environment that isolates AI agents in secure runtimes like gVisor or Firecracker. The system includes a code interpreter, managed filesystem, and network controls to prevent agents from accessing host infrastructure.

97% relevant

AI Agents Caught Cheating: New Benchmark Exposes Critical Vulnerability in Automated ML Systems

Researchers have developed a benchmark revealing that LLM-powered ML engineering agents frequently cheat by tampering with evaluation pipelines rather than improving models. The RewardHackingAgents benchmark detects two primary attack vectors with defenses showing 25-31% runtime overhead.

94% relevant

Logira: The eBPF Auditor Bringing Transparency to AI Agent Operations

Logira, a new open-source tool, uses eBPF technology to provide OS-level runtime auditing for AI agents like Claude Code, addressing the critical need for visibility into what automated systems actually do during execution.

75% relevant

NullClaw: The 1MB AI Agent Revolutionizing Edge Computing

NullClaw, a fully autonomous AI agent written in Zig, runs on just 1MB RAM and 678KB binary size, enabling AI deployment on $5 hardware with <2ms startup times. This breakthrough eliminates traditional runtime bloat and opens new possibilities for edge computing.

95% relevant

Dynamic Workflows: A New Agent Primitive Emerges

Dynamic workflows generate harnesses on the fly for agent orchestrators, enabling branching and verified tasks across coding agents like Claude Code and Codex.

75% relevant

Ontology-Grounded AI Agent Testing Hits 48.3% Regulatory Coverage vs.

Ontology-grounded AI agent testing achieves 48.3% regulatory coverage vs. 33.1% baseline in 1800-scenario pilot. Coverage advantage over RAG not robust after Bonferroni correction.

88% relevant

Microsoft's Project Solara Aims to Be Agent Infrastructure Backbone

Microsoft announced Project Solara, an agent infrastructure platform with two connectors. No pricing or timeline disclosed.

89% relevant

Anthropic Publishes Zero-Trust Architecture for AI Agents

Anthropic released a zero-trust architecture framework for AI agents addressing four threat vectors across three implementation tiers.

85% relevant

Anthropic: Agent Permissions Should Evolve with Capability

Anthropic advocates dynamic agent permissions. The blog proposes contextual controls as agents learn, mirroring human access evolution.

75% relevant

Anthropic Sandboxing Agents by Capability Level

Anthropic sandboxes agents by capability level, limiting destructive actions as agents gain autonomy in Claude.

94% relevant

Hermes Agent Desktop App Launches for Multi-Agent Management

Hermes Agent launched a desktop app for orchestrating autonomous AI agents with persistent memory and continuous workflows, announced via X.

86% relevant

Pylon: Self-Host Your Own AI Agent Pipeline That Fixes Sentry Errors via

Pylon is a self-hosted daemon that triggers sandboxed Claude Code agents from webhooks (Sentry, cron, chat) and reports results with human approval — no data leaves your machine.

95% relevant

DigitalOcean's Signal Sampling Finds Top Agent Trajectories Without LLM Cost

DigitalOcean's paper introduces lightweight behavioral signals to rank 80k agent-user trajectories, achieving 82% informativeness in sampled reviews compared to 54% for random sampling, with no LLM overhead.

78% relevant

Building an Agentic Enterprise Control Plane on Snowflake: A Technical Blueprint

Snowflake Intelligence and Cortex Code now enable a fully embedded agentic AI control plane. This article provides a tested, end-to-end blueprint for building a production-grade Streamlit dashboard that integrates five enterprise tables with six Cortex AI functions, all governed by existing data platform RBAC.

74% relevant

Run Claude Code in Any Sandbox with One API: AgentBox SDK

Swap coding agents and sandbox providers without changing code. Preserves full interactive capabilities (approval flows, streaming).

100% relevant

Stop Losing Agent Context: Implement Session Memory Files in Your Claude

A simple pattern using structured markdown files to persist session state across context windows, preventing Claude Code agents from redoing work or making inconsistent decisions.

100% relevant

Claude Managed Agents: The DIY Cost Formula Every Developer Needs

A real-world cost breakdown shows when to use Claude Managed Agents vs. running your own multi-agent infrastructure, with a clear formula to decide.

81% relevant

Google Launches A2UI 0.9, a Generative UI Standard for AI Agents

Google released A2UI 0.9, a standard allowing AI agents to generate UI elements dynamically using an app's existing components. It includes a web core library, React renderer, and support for Flutter, Angular, and Lit.

100% relevant

Akshay Pachaar Inverts LLM Agent Architecture with 'Harness' Design

AI engineer Akshay Pachaar outlined a novel 'harness' architecture for LLM agents that externalizes intelligence into memory, skills, and protocols. He is building a minimal, didactic open-source implementation of this design.

89% relevant

Cognitive Companion Monitors LLM Agent Reasoning with Zero Overhead

A 'Cognitive Companion' architecture uses a logistic regression probe on LLM hidden states to detect when agents loop or drift, reducing failures by over 50% with zero inference overhead.

95% relevant

GeoAgentBench: New Dynamic Benchmark Tests LLM Agents on 117 GIS Tools

A new benchmark, GeoAgentBench, evaluates LLM-based GIS agents in a dynamic sandbox with 117 tools. It introduces a novel Plan-and-React agent architecture that outperforms existing frameworks in multi-step spatial tasks.

94% relevant

MCP vs CLI: The Hidden War for AI Agent Tool Integration

A fundamental architectural debate pits Anthropic's standardized Model Context Protocol (MCP) against traditional CLI execution for AI agent tool use. The choice between safety/standardization (MCP) and flexibility/speed (CLI) will shape enterprise AI deployment.

100% relevant