Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

A developer reviews an AI agent configuration file, comparing its tool allowlist against actual agent behavior in a…

How to Automatically Verify Agent Allowlists Match Behavior with a

Use a body-vs-allowlist cross-checker in your agent validator, versioned with each tightening rule, so your 317+ agents declare what they actually do — not what you hope they do.

AAAla SMITH & AI Research Desk·5d ago·3 min read··17 views·AI-Generated·Report error

Source: dev.tovia devto_claudecodeCorroborated

How do I automatically verify my Claude Code agent's tool allowlist matches what its body actually does?

Use a schema-versioned validator that reads both frontmatter and body, strips code blocks, then cross-checks every tool mentioned in the body against the allowlist. Bump the schema version with each tightening rule so drift is auditable.

TL;DR

Your agent tool allowlist is just a comment until a runtime gate cross-checks it against the body's actual tool calls.

The Problem: Your Allowlist Is a Comment Until It's Checked

Every Claude Code agent has two surfaces: the frontmatter (tools, name, description) and the body (the prose that tells the agent what to do). The tools field is a runtime allowlist — tools not listed are blocked. But nothing proves the body's actual tool calls match that list.

Two silent failures emerge:

Under-declaration: The body says "collect analytics with umami tools," but tools only lists Read. Every umami call is blocked at runtime. The agent looks valid, passes schema checks, and silently can't do its job.
Over-declaration: The allowlist grants tools the body never uses. The blast radius is overstated — exactly the wrong thing for an allowlist to do.

Neither shows up in a frontmatter-only validator. You can check tools is a well-formed list of real tool names and still have an allowlist that has nothing to do with what the agent does.

The Fix: A Schema-Versioned Body-vs-Allowlist Cross-Check

The solution from the source repo (317 agents) is a validator that reads both frontmatter and body, then cross-checks them. Here's the approach:

Step 1: Strip Code Blocks First

Code blocks contain example tool calls that aren't actual behavior. Strip them before analysis:

body = strip_fenced_code(body)  # Remove ``` blocks so examples don't read as tool calls

Step 2: Cross-Check Tools Mentioned in Body Against Allowlist

def check_agent_body_vs_allowlist(frontmatter, body):
    body = strip_fenced_code(body)
    declared = mcp_tools_in(frontmatter)  # Tools in the allowlist
    used = mcp_tools_in(body)             # Tools mentioned in body prose
    
    undeclared = used - declared          # Body uses tools not in allowlist
    overgranted = declared - used         # Allowlist grants tools body never uses
    
    return undeclared, overgranted

Step 3: Version the Gate

Every change to what the validator accepts bumps SCHEMA_VERSION and lands a changelog entry. When an agent that passed last month fails today, the version delta tells you which rule moved and where it was signed off.

The progression from the source:

Schema 3.10.0: Flipped from lenient to kernel-strict — required name, description, tools, model, color, version, author, tags as errors
Schema 3.11.0: Added the body-vs-allowlist consistency check

Step 4: Land the Gate with the Fleet

CI goes red until all agents are remediated in the same branch. The gate and the fleet land together. A gate you merge before the fleet conforms is just a broken main with a TODO.

Use a two-pass approach:

Green pass: Dumb, deterministic remediation — every agent gets a full canonical tool set, scaffolded tags, version: 1.0.0. Nothing clever, just enough to satisfy the new required set.
A-grade pass: Narrow to least privilege — rewrite description into capability + "Use when…" + trigger phrases, replace scaffolded tags with real lowercase topic tags, narrow tools from full-canonical to least-privilege (median: 5 tools, zero agents retaining the full 10-tool set).

Why This Matters for Claude Code Users

If you maintain multiple agent files (in a team, repo, or organization)

Source: dev.to

Source: gentic.news · 5d ago · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

This story is part of

Claude Code's Campus Conquest Flips Anthropic's Talent Pipeline, Leaving Google's Academic Edge in Doubt

Viral adoption at MIT and Stanford transforms Claude Code from product into recruiting funnel, threatening Google's long-held research talent dominance

Mentioned in this article

Claude Code

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Open Source

Shopify's Catalog API Goes Self-Serve as Amazon, Meta, and Microsoft Back Its Commerce Protocol

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

More in Open Source

View all

A close-up of dense lines of C and CUDA code on a dark screen, with a terminal window showing compilation output in…

Open Source

NanoEuler: GPT-2-Scale 116M Model Built in Pure C/CUDA From Scratch

NanoEuler is a 116M-parameter GPT-2-scale model built in pure C/CUDA from scratch. It provides a complete educational training pipeline for understanding LLMs at the lowest level.

github.com/2d ago/3 min read

open sourcecudaai models

Zhipu AI engineer points at monitor displaying GLM-5.2 ranking chart, office with coding screens visible…

Open SourceBreakthrough

100

Zhipu GLM-5.2 tops global coding benchmarks, sparks 'DeepSeek moment'

Zhipu AI's GLM-5.2 ranks top-3 globally on a coding benchmark, with US engineers calling it a daily driver superior to GPT-5.5.

scmp.com/5d ago/3 min read/Widely Reported

open sourcechinacoding

Open Source

Wan-Streamer v0.1 Cuts Audio-Visual Interaction Latency to 200ms in Single

Wan-Streamer v0.1 achieves 200ms model-side latency in a single Transformer for full-duplex audio-visual interaction, eliminating cascaded modules. The paper lacks parameter count and benchmark comparisons, limiting reproducibility.

arxiv.org/6d ago/3 min read

real-time systemsmultimodal modelsai research

The Problem: Your Allowlist Is a Comment Until It's Checked

The Fix: A Schema-Versioned Body-vs-Allowlist Cross-Check

Step 1: Strip Code Blocks First

Step 2: Cross-Check Tools Mentioned in Body Against Allowlist

Step 3: Version the Gate

Step 4: Land the Gate with the Fleet

Why This Matters for Claude Code Users

✨AI Toolslive

Related Articles

How to Write a CLAUDE.md for FastAPI That Stops AI-Generated Code Inconsistency

Caliper: Run Your Claude Code Skills k Times and Get a pass@k Score That

Zhipu GLM-5.2 tops global coding benchmarks, sparks 'DeepSeek moment'

MCP Server Versioning: How to Avoid Breaking All Your AI Clients (Like I

5 Harness Internals That Changed How I Use Claude Code Daily

Shopify's Catalog API Goes Self-Serve as Amazon, Meta, and Microsoft Back Its Commerce Protocol

The framework underneath this story

More in Open Source

NanoEuler: GPT-2-Scale 116M Model Built in Pure C/CUDA From Scratch

Zhipu GLM-5.2 tops global coding benchmarks, sparks 'DeepSeek moment'

Wan-Streamer v0.1 Cuts Audio-Visual Interaction Latency to 200ms in Single