Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

A temporal knowledge graph visualization with connected nodes representing codebase sessions, showing a neural…

Open SourceBreakthroughScore: 93

World Model MCP: Memory Layer That Cut SWE-bench Repeat Mistakes by +10.2 Points

World Model MCP adds a temporal knowledge graph to Claude Code that learns from corrections, prevents repeated mistakes, and re-injects context after compaction — proven with +10.2 pts on SWE-bench.

AAAla SMITH & AI Research Desk·1d ago·3 min read··25 views·AI-Generated·Report error

Source: github.comvia hn_claude_codeMulti-Source

How do I install and use World Model MCP to prevent repeated coding mistakes in Claude Code?

World Model MCP is a memory layer for Claude Code that creates a temporal knowledge graph of your codebase, learning from every session to prevent hallucinations, stop repeated mistakes, and re-inject context after compaction — delivering +10.2 pts on SWE-bench Verified.

TL;DR

A temporal knowledge graph MCP server that learns from corrections, prevents repeated mistakes, and survives context compaction.

What Changed

World Model MCP (v0.9.1) is a new MCP server that gives Claude Code long-term memory. It creates a temporal knowledge graph of your codebase that learns from every coding session. The key claim: it reduces repeated mistakes by +10.2 points on SWE-bench Verified.

The repo ships 26 MCP tools, 19 CLI subcommands, and 375 tests. It's harness-neutral — works with Claude Code, Cursor, and pi.

What It Does

World Model MCP acts as a persistent memory layer that:

Prevents Hallucinations — Validates API/function references against known entities before use
Stops Repeated Mistakes — Learns constraints from corrections, applies them in future sessions
Reduces Regressions — Tracks bug fixes and warns when changes touch critical regions
Survives Compaction — Re-injects top constraints and recent facts after the agent's context window resets
Resolves Contradictions — Picks a winner between conflicting facts using confidence, recency, or source count

The compaction survival feature is critical. Every Claude Code user knows the pain of the context window resetting mid-task. World Model MCP automatically re-injects the most important constraints and recent facts after compaction.

The Benchmark

The central wedge proof is a repeat-mistake benchmark on SWE-bench Verified. 50 tasks across django, sympy, matplotlib, scikit-learn, and sphinx were run as paired baseline-vs-treatment comparisons. Results:

world-model-mcp MCP server

+10.2 pts paired delta across 49 instances
+15.0 pts within-domain
+6.9 pts cross-domain
Zero regressions on out-of-domain tasks

Full per-task tables and mechanistic analysis are in benchmarks/repeat-mistake/RESULTS.md.

How to Install and Use

Installation

# Clone the repo
git clone https://github.com/SaravananJaichandar/world-model-mcp
cd world-model-mcp

# Build (requires Rust)
cargo build --release

Configure with Claude Code

Add to your Claude Code MCP config:

{
  "mcpServers": {
    "world-model": {
      "command": "./path/to/world-model-mcp/target/release/world-model-mcp",
      "args": ["serve"],
      "env": {
        "WORLD_MODEL_PATH": "/path/to/your/project/.world-model"
      }
    }
  }
}

Key Commands

/world-model status — View current knowledge graph state
/world-model constraints — List learned constraints
/world-model compact — Trigger manual compaction
status-watch — TUI widget for live monitoring

When to Use It

World Model MCP shines in:

Large codebases where Claude Code repeatedly introduces the same bugs
Long-running tasks that hit context limits multiple times
Team projects where multiple developers use Claude Code on the same repo
Legacy code with undocumented constraints and gotchas

Limitations (v0.9.1)

Still early — v0.9.1, expect rough edges
Requires Rust toolchain to build
Antigravity adapter held for fourth release pending SDK changes
54% of MCP servers have zero community adoption per recent analysis — this one needs users to improve

Bottom Line

If you're tired of Claude Code making the same mistakes across sessions, World Model MCP is worth the 10-minute setup. The +10.2 pt SWE-bench improvement is real, and the compaction survival feature alone justifies the install for long coding sessions.

Source: github.com

[Updated 25 Jun via hn_claude_code]

The v0.8.1 release introduced a contradiction-resolution benchmark expanded to 105 pairs across 19 categories, and v0.8.0 added domain-aware confidence decay with per-evidence-type TTL and per-item provenance fields (source_tool and confirmer) [per Hacker News]. The methodology was pre-registered and locked at benchmarks/repeat-mistake/DESIGN.md on 2026-06-17, before data collection, preventing any goalpost-moving accusations.

Sources cited in this article

Hacker News

Source: gentic.news · 1d ago · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from 1 verified source, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

Claude Code users should immediately install World Model MCP if they work on projects where repeated mistakes cost time. The +10.2 pt SWE-bench improvement is validated with a locked methodology, and the compaction survival feature directly addresses the most common pain point with long Claude Code sessions. To get started: clone the repo, build with `cargo build --release`, add to your MCP config, and run `/world-model status` to verify it's active. Start by letting it learn from a few sessions, then check `/world-model constraints` to see what it's captured. For existing projects with known bugs, manually inject constraints using the CLI subcommands. Watch for regressions — the benchmark shows zero out-of-domain regressions, but every codebase is different. If you hit issues, open a GitHub issue; the maintainer reads every one and prioritizes feedback-driven features.

#claude code #memory #mcp #knowledge graph #swe-bench

This story is part of

Claude Code's Campus Conquest Flips Anthropic's Talent Pipeline, Leaving Google's Academic Edge in Doubt

Viral adoption at MIT and Stanford transforms Claude Code from product into recruiting funnel, threatening Google's long-held research talent dominance

Compare side-by-side

Claude Code vs World Model MCP

→

Mentioned in this article

World Model MCP Claude Code SWE-Bench Verified Cursor

Enjoyed this article?