A 1,000-line CLAUDE.md burns 7,000–10,000 tokens per turn in Claude Code. That's instructions the model already knows from reading the codebase, according to a detailed analysis by developer Abd Rahman Saber Abdo.
Key facts
- 1,000-line CLAUDE.md = 7,000-10,000 tokens per turn.
- Skills load ~50 tokens vs 7K+ for CLAUDE.md.
- 95% of agent setups don't need a CLAUDE.md.
- Model at 85% context utilization is measurably worse than at 30%.
- Recursive skill building: workflow → run → skill → failure → fix.
The ritual is everywhere in AI coding agents: you create a CLAUDE.md, list your framework, coding style, and response tone. It feels productive. But it's likely making your agent dumber and wasting money, argues developer Abd Rahman Saber Abdo in a technical deep-dive [According to the source].
The Token Math
When you have a CLAUDE.md file, its entire contents get injected into the context window at the start of every single conversation turn. Not just once — every turn. A 1,000-line file is roughly 7,000–10,000 tokens. Multiply that across a long session with 20–30 back-and-forth exchanges, and you've burned a meaningful chunk of your available context on instructions the model already knows. When you write "This project uses React" in your CLAUDE.md, you're telling the model something it can figure out by looking at your package.json. When you write "use TypeScript" in a project full of .ts files, you're burning tokens to state the obvious [According to the source].
The 95% case — the overwhelming majority of agent setups — doesn't need a CLAUDE.md at all. The exception is when you have genuinely proprietary information: your company's internal methodology, a specific domain convention that's not derivable from the code, something the model has no way to infer. That's the 5% [According to the source].
Skills and Progressive Disclosure
A skill file looks similar to a CLAUDE.md on the surface — it's a markdown file with a name, a description, and a body of detailed instructions. The difference is how it's loaded into context. With a CLAUDE.md, the full content is always present. With a skill, only the name and description are injected into context. The body stays on disk until the agent decides it's relevant. So instead of burning 7,000 tokens on every turn, you're burning maybe 50 — two sentences that tell the agent "this skill exists, and here's roughly what it's for." When the agent encounters a task that matches that description, it reads the rest. Otherwise, it doesn't [According to the source].

That's progressive disclosure. The agent reaches for what it needs rather than carrying everything all the time.
The Biggest Mistake: Writing Skills Before Teaching
A skill written without a recorded successful run is just guesswork. The agent has no reference for what "good" looks like in your specific context. It'll follow the steps mechanically, encounter edge cases you didn't anticipate, and either fail silently or hallucinate its way through gaps in the instructions [According to the source].

The right sequence: identify the workflow, walk through it with the agent step by step, once you have a successful run ask the agent to create the skill from it, then keep iterating. This is recursive skill building: workflow → successful run → skill → failure → fix → update skill → repeat [According to the source].
Context Window Hygiene
Model quality degrades as the context window fills. A fresh context window is fast and sharp. As it fills — with file contents, tool calls, multi-turn conversation history — performance degrades. Not dramatically, but noticeably. The model at 85% context utilization is measurably worse than the same model at 30%. A bloated CLAUDE.md accelerates this. You're filling context with static instructions before the real work even starts. Skills keep that space available for what actually matters: the live conversation, the relevant code, the tool outputs [According to the source].

The mental model: treat your context window like working memory. Don't fill it with things that can be looked up. Fill it with things that are happening right now [According to the source].
On Sub-Agents and Scaling
The same logic applies to multi-agent setups. It's tempting to spin up five sub-agents on day one because the architecture looks impressive. But each of those agents needs its own context, its own skills, its own successful runs before it's genuinely useful. The more productive path: start with one agent doing everything, build skills for your recurring workflows — individually, with real runs — once a workflow is stable and isolated enough, hand it to a dedicated sub-agent. That sub-agent inherits skills with actual successful runs behind them [According to the source].
A single well-configured agent with five solid skills will outperform a complex multi-agent setup where nobody took the time to teach the agents anything [According to the source].
What to watch
Watch for Claude Code and Cursor to adopt progressive disclosure as a default pattern in their agent architectures. The next major release from either tool that ships a 'skills' system with lazy loading will validate this approach at scale. Also track token usage benchmarks from agent evaluation suites like SWE-bench.









