What's the difference between CLAUDE.md and a skill file?

CLAUDE.md injects its full content into context every turn. A skill loads only its name and description (~50 tokens), keeping the detailed body on disk until the agent decides it's relevant.

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Listen

Developer Abd Rahma's code editor showing a CLAUDE.md file with token waste warning, surrounded by code lines and…

Opinion & AnalysisScore: 85

CLAUDE.md Wastes 7K+ Tokens Per Turn; Skills Cut to 50

A 1,000-line CLAUDE.md burns 7,000-10,000 tokens per turn on instructions the model already knows. Skills using progressive disclosure cut that to ~50 tokens.

AAAla SMITH & AI Research Desk·3h ago·5 min read··1 views·AI-Generated·Report error

Source: dev.tovia devto_claudecodeSingle Source

How many tokens does a CLAUDE.md waste per conversation turn?

A 1,000-line CLAUDE.md burns 7,000-10,000 tokens per turn, filling context with instructions the model already knows. Skills load only name and description (~50 tokens), deferring details until needed, cutting context waste by 99%.

TL;DR

CLAUDE.md injects 7K-10K tokens every turn. · Skills use progressive disclosure: 50 tokens in context. · Build skills from successful runs, not guesswork.

A 1,000-line CLAUDE.md burns 7,000–10,000 tokens per turn in Claude Code. That's instructions the model already knows from reading the codebase, according to a detailed analysis by developer Abd Rahman Saber Abdo.

Key facts

1,000-line CLAUDE.md = 7,000-10,000 tokens per turn.
Skills load ~50 tokens vs 7K+ for CLAUDE.md.
95% of agent setups don't need a CLAUDE.md.
Model at 85% context utilization is measurably worse than at 30%.
Recursive skill building: workflow → run → skill → failure → fix.

The ritual is everywhere in AI coding agents: you create a CLAUDE.md, list your framework, coding style, and response tone. It feels productive. But it's likely making your agent dumber and wasting money, argues developer Abd Rahman Saber Abdo in a technical deep-dive [According to the source].

The Token Math

When you have a CLAUDE.md file, its entire contents get injected into the context window at the start of every single conversation turn. Not just once — every turn. A 1,000-line file is roughly 7,000–10,000 tokens. Multiply that across a long session with 20–30 back-and-forth exchanges, and you've burned a meaningful chunk of your available context on instructions the model already knows. When you write "This project uses React" in your CLAUDE.md, you're telling the model something it can figure out by looking at your package.json. When you write "use TypeScript" in a project full of .ts files, you're burning tokens to state the obvious [According to the source].

The 95% case — the overwhelming majority of agent setups — doesn't need a CLAUDE.md at all. The exception is when you have genuinely proprietary information: your company's internal methodology, a specific domain convention that's not derivable from the code, something the model has no way to infer. That's the 5% [According to the source].

Skills and Progressive Disclosure

A skill file looks similar to a CLAUDE.md on the surface — it's a markdown file with a name, a description, and a body of detailed instructions. The difference is how it's loaded into context. With a CLAUDE.md, the full content is always present. With a skill, only the name and description are injected into context. The body stays on disk until the agent decides it's relevant. So instead of burning 7,000 tokens on every turn, you're burning maybe 50 — two sentences that tell the agent "this skill exists, and here's roughly what it's for." When the agent encounters a task that matches that description, it reads the rest. Otherwise, it doesn't [According to the source].

That's progressive disclosure. The agent reaches for what it needs rather than carrying everything all the time.

The Biggest Mistake: Writing Skills Before Teaching

A skill written without a recorded successful run is just guesswork. The agent has no reference for what "good" looks like in your specific context. It'll follow the steps mechanically, encounter edge cases you didn't anticipate, and either fail silently or hallucinate its way through gaps in the instructions [According to the source].

The right sequence: identify the workflow, walk through it with the agent step by step, once you have a successful run ask the agent to create the skill from it, then keep iterating. This is recursive skill building: workflow → successful run → skill → failure → fix → update skill → repeat [According to the source].

Context Window Hygiene

Model quality degrades as the context window fills. A fresh context window is fast and sharp. As it fills — with file contents, tool calls, multi-turn conversation history — performance degrades. Not dramatically, but noticeably. The model at 85% context utilization is measurably worse than the same model at 30%. A bloated CLAUDE.md accelerates this. You're filling context with static instructions before the real work even starts. Skills keep that space available for what actually matters: the live conversation, the relevant code, the tool outputs [According to the source].

Cover image for Your CLAUDE.md Is Wasting Tokens (And It's Probably Not Helping)

The mental model: treat your context window like working memory. Don't fill it with things that can be looked up. Fill it with things that are happening right now [According to the source].

On Sub-Agents and Scaling

The same logic applies to multi-agent setups. It's tempting to spin up five sub-agents on day one because the architecture looks impressive. But each of those agents needs its own context, its own skills, its own successful runs before it's genuinely useful. The more productive path: start with one agent doing everything, build skills for your recurring workflows — individually, with real runs — once a workflow is stable and isolated enough, hand it to a dedicated sub-agent. That sub-agent inherits skills with actual successful runs behind them [According to the source].

A single well-configured agent with five solid skills will outperform a complex multi-agent setup where nobody took the time to teach the agents anything [According to the source].

What to watch

Watch for Claude Code and Cursor to adopt progressive disclosure as a default pattern in their agent architectures. The next major release from either tool that ships a 'skills' system with lazy loading will validate this approach at scale. Also track token usage benchmarks from agent evaluation suites like SWE-bench.

Sources cited in this article

Abd Rahman Saber Abdo.
Abd Rahman Saber Abdo

Source: gentic.news · 3h ago · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from 2 verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The core insight here — that CLAUDE.md injects 7,000-10,000 tokens per turn into context — is a concrete, measurable cost that most developers haven't calculated. The author's framing of 'instructions the model already knows' is sharp: modern code agents have the full codebase in context and can infer framework, language, and structure from package.json, file extensions, and imports. The CLAUDE.md becomes a tax on working memory. What's less well-supported is the claim that model quality degrades measurably at 85% context utilization. While there's general agreement that very long contexts degrade retrieval precision, the author doesn't cite specific benchmarks or papers. The 'measurably worse' claim needs a citation to be actionable. The recursive skill-building workflow — run first, then generate skill from the run — is the strongest contribution. It mirrors how prompt engineering best practices evolved: iterate on real outputs, not synthetic instructions. The author's 'four or five loops' claim is plausible but anecdotal. Missing from the analysis: cost comparison for API users. At Claude Opus 4.6 pricing (~$15/M input tokens), 7,000 tokens per turn across 25 turns is $2.63 per session in wasted input. Skills cutting that to 50 tokens per turn saves $2.60 per session. For heavy users running 50+ sessions a week, that's $130+ monthly savings.

#ai developer tools #llm optimization #agentic coding

Mentioned in this article

Claude Code Abd Rahman Saber Abdo

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Opinion & Analysis

How Claude Code's 'Conversational Context' Beats One-Off Codex Generations

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

CLAUDE.md Wastes 7K+ Tokens Per Turn; Skills Cut to 50

The Token Math

Skills and Progressive Disclosure

The Biggest Mistake: Writing Skills Before Teaching

Context Window Hygiene

On Sub-Agents and Scaling

What to watch

Sources cited in this article

AI Analysis

✨AI Toolslive

Related Articles

Anthropic Co-Founder Predicts Self-Improving AI by 2028

How a Custom Multimodal Transformer Beat a Fine-Tuned LLM for Attribute

CPU Demand Flipping the AI Narrative as Datacenter Growth Shifts

RAG vs Fine-Tuning: A Practical Guide for Choosing the Right LLM

10 Claude Code Skills That Actually Work: A Solo Developer's Vetted List

How Claude Code's 'Conversational Context' Beats One-Off Codex Generations

The framework underneath this story

More in Opinion & Analysis

Snapdragon X2 Elite Beats Intel Arrow Lake for AI Coding Agents

Anthropic Co-Founder Predicts Self-Improving AI by 2028

Anthropic's Jack Clark: ~60% chance of automated AI R&D by 2028