Why didn't the agent read the file body?

The recall response returned only the file title and a 120-character description, making it easy for the agent to assume it had read the file when it had only skimmed the index.

How does the new consumption audit work?

The recall_consumption.py module walks back through the session jsonl, finds recall blocks, extracts memory file paths, and checks subsequent assistant turns for matching Read tool calls.

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Listen

Developer at a terminal running a software update command for Compass v1.1.0, with a large monitor showing version…

Open SourceScore: 100

Compass v1.1.0 Ships Recall Consumption Fix 12 Hours After Launch

Nautilus-Compass v1.1.0 fixes a recall consumption failure where agents saw file titles but didn't read bodies, embedding body text in top-3 hits and adding a drift detector for unconsumed recalls.

AAAla SMITH & AI Research Desk·Jun 4, 2026·4 min read··149 views·AI-Generated·Report error

Source: dev.tovia devto_mcp, hn_claude_code, reddit_claudeWidely Reported

What does Compass v1.1.0 fix regarding agent memory recall?

Nautilus-Compass v1.1.0, shipped 12 hours after v1.0.0, fixes a recall consumption failure where agents saw file titles but did not read the file body, reproducing prior mistakes. The fix embeds body text in top-3 hits and adds a drift detector for unconsumed recalls.

TL;DR

Recall surfaced files but agents didn't read bodies. · Top-3 hits now embed 800 chars of body content. · New module audits if recall hits were actually consumed.

Nautilus-Compass v1.1.0 shipped 12 hours after v1.0.0, fixing a recall consumption failure caught 5 hours post-launch. The bug: Claude Code agents saw file titles from memory recall but did not read the file body, reproducing prior mistakes.

Key facts

v1.1.0 shipped 12 hours after v1.0.0.
Top-3 hits embed first 800 characters of body.
35 negative anchors tracked by drift detector.
Tested on 130MB session: 41 recall hits, 0 consumed.
Reproduction cost: $3.50 end-to-end.

The Recall Consumption Bug

According to the Compass v1.1.0 release post, a sister Claude Code dialog was supposed to publish a long-form article to WeChat using a 6-step quality pipeline documented in cross-session memory. Compass recall fired correctly — the file appeared in the agent's UserPromptSubmit hook output. But the agent saw the title and 80-character description, then acted. It did not Read the file body. The actual rules — how to walk audit-gate, which wxid, what xhs-cards-embed structure looks like — never entered the agent's working context.

The agent then reproduced exactly the failure mode the file was written to prevent: ad-hoc _tmp_publish_v8.cjs scripts, no critic round, wrong login path. The user's diagnosis was sharp: "compass 召回到了 · 我没消费 · 这是 agent 层的人格漂移 · 不是 compass 本身的失败." The release post notes this is structural: returning title + 120-char description made it easy to skim and assume you had read the file when you had only read the index.

Three-Layer Fix

v0 — Embed body in top-3 hits. Top-3 recall hits now embed the first 800 characters of post-frontmatter body in an indented │ block. The agent gets the rules in its working context without an additional Read tool call. Tail hits 4..K stay header-only to keep the response bounded at ~3KB total.

v1 — Embed past-mistake body in anti-anchor alerts. Compass's drift detector matches the current prompt against 35 negative anchors learned from prior mistakes. Previously, alerts just showed the anchor label. v1.1.0 embeds body from the most-relevant past lesson session using a two-tier match: substring 6-gram against the anchor + lesson-type frontmatter, falling back to recent drift!=green sessions.

v2 — Detect 'recall fired but not consumed'. A new module recall_consumption.py walks back through the live session jsonl file, finds recent recall blocks, extracts memory file paths, then checks subsequent assistant turns for matching Read tool calls. If recall surfaced N paths and 0 got read, that is the failure signature. Wired into the drift_check MCP tool result and a mid_session_hook every 25 tool calls, which only nags when >=3 unconsumed AND ratio < 0.3. Tested on a 130MB / 32k-line session: 41 recall hits surfaced, 0 consumed.

Governance Plan Scales Without Templates

v1.1.0 also ships a new governance_plan MCP tool that reads two file-exported registries: agents_capabilities.json (what each executor declares it can do) and anchor_packs_phases.json (per-domain DAG of phases). For each phase, V7 ranks executors by capability score (+10 capability match, +5 domain match, +3 anchor pack match), picks the highest, emits a queue file with depends_on_phase_ids. Verified on marketing/dev-tools and caishen-finance/audit domains. Adding a new domain requires one row in each registry — zero V7 source change.

Eval Numbers Unchanged

Eval numbers remain locked from 2026-05-08: LongMemEval-S 56.6% (n=500), EverMemBench-Dynamic Run 1 44.4% (n=500), Run 2 47.3% (n=497), drift detector ROC AUC 0.83, reproduction cost $3.50 end-to-end. v1.1.0 doesn't move the eval numbers — it moves the consumption numbers, the ratio of recall hits whose body actually lands in the agent's working context.

What to watch

Watch for a clean benchmark on recall consumption ratio — the project explicitly solicits suggestions. If adoption grows, look for similar consumption-audit features in competing agent memory systems (Zep, MemOS) within 90 days.

Source: dev.to

Source: gentic.news · Jun 4, 2026 · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The core insight here is that memory recall is not memory consumption — a distinction most agent memory systems treat as an implementation detail rather than a first-class metric. Compass v1.1.0 makes the agent's failure to read surfaced files a detectable, auditable event. This matters because the dominant failure mode in long-running agents isn't retrieval quality (the file is there) but attentional drift (the agent doesn't bother to read it). The fix — embedding body text directly in the recall response — is elegant because it bypasses the agent's tool-calling discipline entirely. Rather than hoping the model will issue a Read tool call, Compass pre-loads the context. The consumption audit module is the more interesting long-term contribution: it creates a feedback signal that can drive future training or prompt engineering. The governance plan extension, meanwhile, shows a pragmatic approach to multi-agent orchestration: use file-exported registries rather than hardcoded templates, making the system extensible without code changes. The $3.50 reproduction cost is a notable contrast to $50+ GPT-4o-judge stacks, making this approach accessible for smaller teams.

#claude code #mcp #nautilus-compass #agent memory

This story is part of

The Agentic Pivot: How Claude Code Is Forcing a Reconfiguration of the AI Stack

Anthropic's developer tool is becoming the connective tissue between models, infrastructure, and autonomous workflows, challenging OpenAI's application-first strategy.

Compare side-by-side

Claude Code vs Nautilus-Compass

→

Mentioned in this article

Nautilus-Compass Claude Code Claude Opus 4.6

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Products & Launches2 shared topics

Claude Opus 5 Is Now in Claude Code: How to Use Fast Mode and Save 50% on Tokens

Products & Launches2 shared topics

Build a Self-Sustaining Claude Code Environment: The Complete 14-Part System

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

Compass v1.1.0 Ships Recall Consumption Fix 12 Hours After Launch

The Recall Consumption Bug

Three-Layer Fix

Governance Plan Scales Without Templates

Eval Numbers Unchanged

What to watch

AI Analysis

✨AI Toolslive

Related Articles

Claude Opus 5 Is Now in Claude Code: How to Use Fast Mode and Save 50% on Tokens

Claude Code Digest — Jul 19–Jul 22

Build a Claude Content Machine That Never Sounds Like Slop

Why MCP Can't Save Your Input Tokens (And What Actually Works in Claude Code)

Claude Code Plan Mode: How to Catch Wrong Assumptions Before They Become

Build a Self-Sustaining Claude Code Environment: The Complete 14-Part System

The framework underneath this story

More in Open Source

Alibaba Open-Sources SAIL Stack to Break Nvidia CUDA Lock-In

Xiaomi Open-Sources 38B Robotics-U0 Unifying Four Embodied Tasks

Soofi S 30B-A3B: German open model tops English, German benchmarks