How to Cut Agent Token Waste: CLI Over GraphQL + Server-Pushed Hints

Replace raw GraphQL with typed CLI commands to eliminate JSON assembly errors, then add server-pushed hints via MCP to prevent judgment failures. Your agent burns 1,500+ tokens per operation otherwise.

AAAla SMITH & AI Research Desk·Jun 9, 2026·4 min read··98 views·AI-Generated·Report error

Source: dev.tovia devto_mcp, gn_mcp_protocolMulti-Source

How do I reduce token waste from my Claude Code agent's API calls?

Replace raw GraphQL/API calls with a typed CLI wrapper. This eliminates JSON assembly errors that burn 1,500+ tokens per operation. Then add server-pushed hints to prevent judgment failures like forgetting to use uploaded assets.

TL;DR

Stop your agent from burning tokens on JSON assembly errors by replacing raw API calls with typed CLI commands and proactive server hints.

Key Takeaways

Replace raw GraphQL with typed CLI commands to eliminate JSON assembly errors, then add server-pushed hints via MCP to prevent judgment failures.
Your agent burns 1,500+ tokens per operation otherwise.

The Problem: Your Agent Is Bleeding Tokens on JSON Assembly

You designed the perfect architecture — direct API calls, no MCP overhead, a clean SKILL.md behavior spec. The agent calls your GraphQL endpoint with curl, reads your docs, and executes. Elegant.

Then you watch the token counter. A single upload operation that should cost ~200 tokens burns 1,500+. Why? The agent is guessing JSON field formats wrong, getting GraphQL errors, fetching docs across multiple pages to figure out the correct format, and retrying. Every. Single. Time.

This isn't a documentation problem. It's a structural problem: LLMs are fundamentally bad at assembling nested JSON payloads from scratch. You can fix your docs a hundred times and the agent will find a new field to misformat.

The Fix: Typed CLI Arguments

Instead of making the agent assemble raw JSON in curl commands, wrap your API in a CLI with typed arguments:

# Before: agent assembles raw JSON in curl
curl -X POST /graphql -d '{"query":"mutation { uploadAsset(input: { shotId: \"...\", type: \"start_frame\", provenance: { method: \"ai_generated\", model: \"gpt-image-2\", prompt: \"...\" } }) { id } }"}'

# After: typed CLI arguments, zero JSON assembly
python3 nl.py upload <shotId> start_frame frame.png --method ai_generated --model "gpt-image-2" --prompt "Winter city street"

This eliminates the error-recovery loop entirely. The agent passes flags, not JSON. The CLI dispatcher handles type conversion server-side.

Bonus: One CLI, Two Audiences

Add a --json flag so the same CLI serves both the agent (structured data) and you (human-readable output):

# For the agent: structured JSON for parsing
python3 nl.py overview <noteId> --json

# For you watching: readable progress
python3 nl.py overview <noteId>
# Episode 01: The Algorithm Hunter
#   [===done===|--review--|......not_started.......] 3/12
#   Shot   Status       Rolls    Best   PF
#   01A    done         3        48     Y
#   01B    review       2        41     Y

The Next Level: Server-Pushed Hints

CLI fixed execution errors. But your agent still makes bad decisions — re-rolling without changing prompts, forgetting to use uploaded assets, skipping status updates. These are judgment failures, not execution failures.

Cover image for My server pushes hints to agents — and the 3 iterations that led there

The solution: let your server push hints to the agent proactively. When the server detects an impending mistake (e.g., a prompt written without referencing available assets), it injects a hint:

ctx.pendingHints.push({
  type: "available_refs",
  priority: "high",
  message: `Available refs for prompting: ${refs.map(r => `@${r.filename} (${r.assetType})`).join(", ")}`,
  metadata: { targetId: shot.id, refs },
});

This catches failures before they happen. The agent doesn't have to remember everything — the server nudges it at the critical moment.

How to Apply This to Your Claude Code Workflow

Audit your agent's token waste: Watch for error→doc→retry loops. If you see them, the fix isn't better docs — it's eliminating the assembly step.
Build a CLI wrapper: Create a typed CLI for your API. Even a simple Python script with argparse is enough. Route all 34 commands through it.
Add server-pushed hints: After each operation, check for common judgment failures and inject hints before the agent's next action. Ask your agent: "Would a nudge here have prevented this?"
Iterate with reflection: Pause production periodically and ask your agent what gaps in your behavior spec caused inefficient actions. Fix those gaps. Repeat.

Why This Works

CLI arguments are inherently type-safe for LLMs — no JSON assembly, no field guessing, no error recovery loops.
Server-pushed hints are cheaper than error recovery — injecting a hint costs ~50 tokens; recovering from a wrong decision costs 1,500+.
Your agent is your best auditor — it knows exactly where your spec failed it. Just ask.

This isn't about avoiding MCP. It's about recognizing that the real work starts after the architecture is in place. The agent needs guardrails, not just documentation.

Source: dev.to

Source: gentic.news · Jun 9, 2026 · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

Claude Code users should immediately audit their agent's token usage for error-recovery loops. If you see the agent fetching docs and retrying API calls, the fix is not better documentation — it's eliminating the JSON assembly step entirely. Build a typed CLI wrapper for your API, even if it's just a simple Python script. This single change can cut token usage per operation by 7x or more. Second, implement server-pushed hints for judgment failures. After each operation, check for common mistakes (forgetting to use uploaded assets, skipping status updates) and inject hints proactively. This is cheaper than letting the agent fail and recover. Use the reflection loop: pause production, ask your agent what gaps in your behavior spec caused problems, fix those gaps, and repeat. Your agent knows exactly where it failed — you just have to ask.

#claude code #cli #mcp #token optimization #agent workflow

Mentioned in this article

GraphQL

Enjoyed this article?