Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Developer at a terminal running a software update command for Compass v1.1.0, with a large monitor showing version…
Open SourceScore: 100

Compass v1.1.0 Ships Recall Consumption Fix 12 Hours After Launch

Nautilus-Compass v1.1.0 fixes a recall consumption failure where agents saw file titles but didn't read bodies, embedding body text in top-3 hits and adding a drift detector for unconsumed recalls.

·15h ago·4 min read··29 views·AI-Generated·Report error
Share:
Source: dev.tovia devto_mcp, hn_claude_code, reddit_claudeWidely Reported
What does Compass v1.1.0 fix regarding agent memory recall?

Nautilus-Compass v1.1.0, shipped 12 hours after v1.0.0, fixes a recall consumption failure where agents saw file titles but did not read the file body, reproducing prior mistakes. The fix embeds body text in top-3 hits and adds a drift detector for unconsumed recalls.

TL;DR

Recall surfaced files but agents didn't read bodies. · Top-3 hits now embed 800 chars of body content. · New module audits if recall hits were actually consumed.

Nautilus-Compass v1.1.0 shipped 12 hours after v1.0.0, fixing a recall consumption failure caught 5 hours post-launch. The bug: Claude Code agents saw file titles from memory recall but did not read the file body, reproducing prior mistakes.

Key facts

  • v1.1.0 shipped 12 hours after v1.0.0.
  • Top-3 hits embed first 800 characters of body.
  • 35 negative anchors tracked by drift detector.
  • Tested on 130MB session: 41 recall hits, 0 consumed.
  • Reproduction cost: $3.50 end-to-end.

The Recall Consumption Bug

According to the Compass v1.1.0 release post, a sister Claude Code dialog was supposed to publish a long-form article to WeChat using a 6-step quality pipeline documented in cross-session memory. Compass recall fired correctly — the file appeared in the agent's UserPromptSubmit hook output. But the agent saw the title and 80-character description, then acted. It did not Read the file body. The actual rules — how to walk audit-gate, which wxid, what xhs-cards-embed structure looks like — never entered the agent's working context.

The agent then reproduced exactly the failure mode the file was written to prevent: ad-hoc _tmp_publish_v8.cjs scripts, no critic round, wrong login path. The user's diagnosis was sharp: "compass 召回到了 · 我没消费 · 这是 agent 层的人格漂移 · 不是 compass 本身的失败." The release post notes this is structural: returning title + 120-char description made it easy to skim and assume you had read the file when you had only read the index.

Three-Layer Fix

v0 — Embed body in top-3 hits. Top-3 recall hits now embed the first 800 characters of post-frontmatter body in an indented block. The agent gets the rules in its working context without an additional Read tool call. Tail hits 4..K stay header-only to keep the response bounded at ~3KB total.

v1 — Embed past-mistake body in anti-anchor alerts. Compass's drift detector matches the current prompt against 35 negative anchors learned from prior mistakes. Previously, alerts just showed the anchor label. v1.1.0 embeds body from the most-relevant past lesson session using a two-tier match: substring 6-gram against the anchor + lesson-type frontmatter, falling back to recent drift!=green sessions.

v2 — Detect 'recall fired but not consumed'. A new module recall_consumption.py walks back through the live session jsonl file, finds recent recall blocks, extracts memory file paths, then checks subsequent assistant turns for matching Read tool calls. If recall surfaced N paths and 0 got read, that is the failure signature. Wired into the drift_check MCP tool result and a mid_session_hook every 25 tool calls, which only nags when >=3 unconsumed AND ratio < 0.3. Tested on a 130MB / 32k-line session: 41 recall hits surfaced, 0 consumed.

Governance Plan Scales Without Templates

v1.1.0 also ships a new governance_plan MCP tool that reads two file-exported registries: agents_capabilities.json (what each executor declares it can do) and anchor_packs_phases.json (per-domain DAG of phases). For each phase, V7 ranks executors by capability score (+10 capability match, +5 domain match, +3 anchor pack match), picks the highest, emits a queue file with depends_on_phase_ids. Verified on marketing/dev-tools and caishen-finance/audit domains. Adding a new domain requires one row in each registry — zero V7 source change.

Eval Numbers Unchanged

Eval numbers remain locked from 2026-05-08: LongMemEval-S 56.6% (n=500), EverMemBench-Dynamic Run 1 44.4% (n=500), Run 2 47.3% (n=497), drift detector ROC AUC 0.83, reproduction cost $3.50 end-to-end. v1.1.0 doesn't move the eval numbers — it moves the consumption numbers, the ratio of recall hits whose body actually lands in the agent's working context.

What to watch

Watch for a clean benchmark on recall consumption ratio — the project explicitly solicits suggestions. If adoption grows, look for similar consumption-audit features in competing agent memory systems (Zep, MemOS) within 90 days.


Source: dev.to


Source: gentic.news · · author= · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

The core insight here is that memory recall is not memory consumption — a distinction most agent memory systems treat as an implementation detail rather than a first-class metric. Compass v1.1.0 makes the agent's failure to read surfaced files a detectable, auditable event. This matters because the dominant failure mode in long-running agents isn't retrieval quality (the file is there) but attentional drift (the agent doesn't bother to read it). The fix — embedding body text directly in the recall response — is elegant because it bypasses the agent's tool-calling discipline entirely. Rather than hoping the model will issue a Read tool call, Compass pre-loads the context. The consumption audit module is the more interesting long-term contribution: it creates a feedback signal that can drive future training or prompt engineering. The governance plan extension, meanwhile, shows a pragmatic approach to multi-agent orchestration: use file-exported registries rather than hardcoded templates, making the system extensible without code changes. The $3.50 reproduction cost is a notable contrast to $50+ GPT-4o-judge stacks, making this approach accessible for smaller teams.
Compare side-by-side
Claude Code vs Nautilus-Compass
Enjoyed this article?
Share:

AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Related Articles

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

More in Open Source

View all