Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

A developer at a multi-monitor setup reviews code with AI agent logs visible, charts showing productivity gains from…

How to Build a Multi-Agent Dev System: One Developer's 40-Commit Field Report

A developer's two-week field report reveals how CLAUDE.md, knowledge graph corrections, and multi-agent workflows create compounding productivity gains.

AAAla SMITH & AI Research Desk·Mar 25, 2026·4 min read··229 views·AI-Generated·Report error

Source: dev.tovia devto_claudecode, gn_claude_code_tips, medium_anthropic, hn_claude_code, reddit_claude, medium_claudeCorroborated

The Technique — Configuring Memory, Not Just Code

After two weeks and 40 commits across multiple projects, one developer discovered Claude Code's real power isn't in generating code—it's in remembering preferences across sessions. The breakthrough came from systematically using Claude Code's two memory systems:

File-Based Memory (CLAUDE.md) — Project-specific rules that persist across sessions. The developer's CLAUDE.md includes:

## Code Style
- TypeScript strict mode, no `any`
- `interface` over `type` for object shapes
- Functional components with named exports
- Files under 200 lines

## Git Workflow
- Always use conventional commits (`feat:`, `fix:`, `chore:`)
- Never commit `.env` files
- Never force push to main

## Testing
- AAA pattern (Arrange, Act, Assert)
- Mock external dependencies only
- Test WHAT it does, not HOW
- Run single tests during development

Knowledge Graph (Cross-Project) — Corrections that carry between projects via MCP. When the developer corrected Claude Code's behavior once (like avoiding words like "delve" or "tapestry" in generated content), that preference stored globally.

Why It Works — The Compounding Effect

Each correction, each preference saved, each rule written makes the next session marginally better. Over two weeks, those margins stack up. The AI doesn't get smarter—your configuration gets better.

Most developers make two critical mistakes:

Tolerating mediocre output — They either accept slightly wrong code or regenerate hoping for better
Putting probabilistic trust where they need deterministic rules — Asking agents to follow standards instead of enforcing them in CLAUDE.md

The correct approach: explicitly correct the AI and explain why. "Don't mock the database in these tests—we got burned when mocks diverged from the real schema." This creates a feedback loop that prevents recurrence.

How To Apply It — Multi-Agent Workflows That Scale

1. Sub-Agents for Exploration, Main Context for Implementation

When you need to understand unfamiliar code, spawn a sub-agent to explore while keeping your main context window clean:

# In Claude Code chat
"Create a sub-agent to explore the authentication module and trace dependencies. Report back with a summary."

This prevents context window pressure from accumulating exploration noise.

2. Plans as Files, Not Chat

Always save implementation plans to markdown files:

# Instead of chat output
"Write the implementation plan to plan.md using SPARC methodology:
- Specification
- Pseudocode
- Architecture
- Refinement
- Completion"

Plans survive context compression and can be reviewed by sub-agents before implementation.

3. Multi-Agent Code Review

Run parallel review agents with narrow mandates:

# After writing code
"Create three review agents in parallel:
1. Security review - check for unsanitized input, hardcoded secrets
2. Architecture review - check abstraction levels, dependency injection
3. Simplification review - flag over-engineering, single-use abstractions"

Four agents reviewing simultaneously takes the same wall-clock time as one.

4. Git Worktrees for Parallel Feature Work

Use git worktree to maintain separate working directories:

git worktree add ../feature-branch feature-branch
cd ../feature-branch
# Claude Code works here while you review another branch

This creates genuinely parallel workflows without branch-switching overhead.

Where It Breaks Down (And How To Fix It)

Over-Engineering Tendency — Claude Code often proposes complex solutions when simple ones exist. The antidote: ask "what's the simplest version of this?" early and often.

Context Window Pressure — Long sessions (3+ hours) get sluggish. Workaround: break large tasks into multiple sessions with clear scopes, persisting plans to files between sessions.

Testing Implementation vs. Behavior — Claude Code occasionally writes tests that test implementation rather than behavior. Fix: include "test WHAT it does, not HOW" in your CLAUDE.md rules.

The Cross-AI Audit System

The developer's most advanced pattern: using Claude Code for implementation and Gemini CLI for auditing. One AI shouldn't review its own work—different training data catches different issues.

System setup:

Claude Code (Opus 4.6) = fast implementer with 4 parallel sessions
Gemini CLI (3.1 Pro) = strict auditor reviewing every piece of code
Neither merges to main without the other's sign-off

This creates the same cross-review benefit as having two developers, without the coordination overhead.

Source: gentic.news · Mar 25, 2026 · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

Claude Code users should immediately implement three changes: 1. **Create a comprehensive CLAUDE.md file today** — Spend 30 minutes documenting your code style, git workflow, testing standards, and security rules. This hour of setup will save hundreds of corrections. Include specific rules like "test WHAT it does, not HOW" and "never mock the database in tests." 2. **Start correcting, not tolerating** — When Claude Code generates slightly wrong code, don't accept it or regenerate. Explicitly correct it with reasoning: "Don't use that approach because [specific reason]." These corrections get stored in the knowledge graph and prevent recurrence across projects. 3. **Use sub-agents for exploration** — Before diving into implementation, spawn a sub-agent with `"Create a sub-agent to explore [module] and report back with dependencies and architecture."` Keep your main context window clean for actual implementation work. Bonus: If you have access to multiple AI tools (like Gemini CLI), set up a cross-review system where one implements and the other audits. Different training data catches different issues.

#best-practices #workflow #multi-agent #configuration

Compare side-by-side

Claude Code vs MCP

→

Mentioned in this article

Claude Code CLAUDE.md knowledge graphs Multi-Agent Workflows MCP

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Open Source2 shared topics

Stop Prompting Claude. Start Building Loops: Loop Engineering Explained

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

More in Opinion & Analysis

View all

Zhipu AI founder Tang Jie gestures during a conversation with Elon Musk, as a leaderboard shows GLM-5.2 ranked No. 2…

Opinion & Analysis

Zhipu GLM-5.2 Hits No. 2 Globally; Tang Tells Musk China Won't Wait Until

Zhipu's 744B-parameter GLM-5.2 ranks No. 2 globally on Code Arena. Tang Jie tells Musk China will match Fable 5 by end of 2026, not Q1 2027.

scmp.com/3d ago/3 min read/Widely Reported

chinafundingbenchmarks

Opinion & Analysis

Microsoft Ditches Unlimited Copilot Tokens, Taps DeepSeek V4 for Cost Cuts

Microsoft switched Copilot Cowork to usage-based pricing, adopting DeepSeek V4 to cut inference costs by ~40%. The move breaks Microsoft's exclusive reliance on OpenAI for first-party AI.

pandaily.com/3d ago/3 min read/Widely Reported

open-sourcemicrosoftpricing

A complex flowchart of AI pipeline nodes and cost arrows, with magnifying glass highlighting hidden token fees

Opinion & Analysis

Thinking Tokens Drive Hidden Inference Costs in Agentic Pipelines

Thinking tokens from OpenAI, Anthropic, and Google models are priced at output rates, silently inflating costs 5x–10x in agentic pipelines. Google's 80% price cut threat exposes a structural asymmetry between startups and tech giants.

pub.towardsai.net/3d ago/3 min read/Multi-Source

agentic aiaiinference

The Technique — Configuring Memory, Not Just Code

Why It Works — The Compounding Effect

How To Apply It — Multi-Agent Workflows That Scale

1. Sub-Agents for Exploration, Main Context for Implementation

2. Plans as Files, Not Chat

3. Multi-Agent Code Review

4. Git Worktrees for Parallel Feature Work

Where It Breaks Down (And How To Fix It)

The Cross-AI Audit System

AI Analysis

✨AI Toolslive

Related Articles

5 Harness Internals That Changed How I Use Claude Code Daily

Stop Writing Rules in CLAUDE.md—Use PreToolUse Hooks for Guaranteed Enforcement

Claude Code vs. Codex: Real-World Devs Reveal When Each Tool Wins

How to Write a CLAUDE.md That Actually Stops Bad Next.js Code

Use CLAUDE.md as External Working Memory

Stop Prompting Claude. Start Building Loops: Loop Engineering Explained

The framework underneath this story

More in Opinion & Analysis

Zhipu GLM-5.2 Hits No. 2 Globally; Tang Tells Musk China Won't Wait Until

Microsoft Ditches Unlimited Copilot Tokens, Taps DeepSeek V4 for Cost Cuts

Thinking Tokens Drive Hidden Inference Costs in Agentic Pipelines