Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

Anthropic's Silent Cache TTL Cut

Anthropic's Silent Cache TTL Cut

Claude Code's default cache TTL was silently reduced to 5 minutes on April 2, drastically increasing token costs. Use hooks and settings to mitigate the impact.

GAla Smith & AI Research Desk·11h ago·4 min read·5 views·AI-Generated
Share:
Source: reddit.comvia reddit_claude, hn_claude_codeCorroborated

What Changed

On April 2, 2024, Anthropic silently changed the default cache Time-To-Live (TTL) for Claude Code from 1 hour to 5 minutes. This was not announced in any changelog, and the official documentation still states "up to 1 hour." The change is definitive: analysis of 1,140 sessions shows 100% usage of the ephemeral_1h tier before April 2, a mixed day on April 2, and 100% usage of the ephemeral_5m tier from April 3 onward.

The Impact on Your Workflow and Wallet

The practical effect is severe. For one developer, daily cache busts increased from 39 to 199—a 5.1x multiplier. The associated cost rose from $6.28/day to $15.54/day, projecting a $277.80 monthly increase from this single change.

The shorter TTL creates a compounding problem:

  1. Mid-Session Expiry: Cache expires while you're still working, causing Claude to lose confidence in its context.
  2. Redundant Reads: Claude re-reads files to re-establish context, padding conversation history.
  3. More Expensive Rebuilds: That padded history makes the next cache rebuild even costlier.

A critical, hidden trap is backgrounded tasks. When Claude runs a long tool call, agent, or a command like /loop with an interval over 5 minutes, it suspends the session. If the task takes longer than 5 minutes to return, the cache expires before you see the result. Your next turn pays full price to rebuild the context you just had.

How to Fight Back: Settings and Hooks

You can't change the TTL, but you can change your configuration to minimize the damage.

1. Install the Token Insights Skill

The claude-memory plugin from the gupsammy/Claudest repository provides data and automated hooks.

/plugin marketplace add gupsammy/claudest
/plugin install claude-memory@claudest

Run /get-token-insights in a session. If Claude detects the 5-minute TTL pattern in your data, it will offer to install protective hooks automatically.

2. Configure Protective Hooks (Manual Setup)

Add these hooks to your ~/.claude/settings.json file to get warnings before a cache bust occurs.

{
  "Stop": [
    "plugins/claude-memory/hooks/cache-warn-stop.py",
    "plugins/claude-memory/hooks/cache-warn-3min.sh"
  ],
  "UserPromptSubmit": [
    "plugins/claude-memory/hooks/cache-expiry-warn.py"
  ]
}
  • cache-expiry-warn.py: Warns you when a prompt submission is about to trigger a cache bust.
  • cache-warn-stop.py & cache-warn-3min.sh: A two-part system that starts a background timer and stops the session before the 5-minute expiry.

3. Adjust Your Global Settings

Cap your context window and enable clearer session management:

{
  "env": {
    "CLAUDE_CODE_DISABLE_1M_CONTEXT": "1",
    "ENABLE_TOOL_SEARCH": "1"
  },
  "showClearContextOnPlanAccept": true
}
  • CLAUDE_CODE_DISABLE_1M_CONTEXT": "1": This is the single most impactful setting. It caps your context at 200K tokens instead of 1 million. When a cache bust happens, you rebuild from scratch. A 1M-token rebuild costs ~5x more than a 200K-token rebuild. With busts happening 12x more often than before April 2, this setting is essential for cost control.
  • showClearContextOnPlanAccept": true: This adds a button to clear context after accepting a plan. It's useful for separating planning and implementation into cleaner, cheaper sessions.

The Official Stance and Your Action Plan

Boris, the creator of Claude Code, has acknowledged the caching issue on GitHub. The official advice is to start new conversations more frequently to avoid large cache misses, and to be selective with skills/agents to keep context lean.

Your new workflow should be:

  1. Monitor: Run /get-token-insights to see your cache bust pattern.
  2. Protect: Install the hooks to get warnings before you lose cache.
  3. Limit: Cap your context at 200K via the environment variable.
  4. Segment: Use showClearContextOnPlanAccept and consciously start new sessions for long-running tasks or after breaks.
  5. Feedback: Use Claude Code's /feedback command to report specific instances of costly cache behavior. More data helps Anthropic prioritize a fix.

This isn't just about cost—it's about predictability. A 5-minute TTL turns any pause into a potential budget-buster. Configure your environment now to take back control.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

Claude Code users must immediately change their configuration. The silent TTL change is a fundamental shift in the economics of your sessions. **First, disable the 1M context window.** Add `"CLAUDE_CODE_DISABLE_1M_CONTEXT": "1"` to your `env` in `settings.json`. This is non-negotiable. Every cache bust now rebuilds your entire context. Rebuilding 200K tokens is bad; rebuilding 1M tokens is catastrophic at the new bust frequency. **Second, install the warning hooks.** Whether via the `claude-memory` plugin's automated offer or manual setup, you need a system that alerts you before the 5-minute expiry. The hooks allow you to strategically stop a session, copy the relevant context, and start a new one—converting an expensive, silent cache bust into a managed, cheaper context transfer. **Third, adapt your session discipline.** The era of leaving a Claude Code session open all day is over. Treat sessions as ephemeral. Use the new `showClearContextOnPlanAccept` setting. For any task you know will involve a pause (waiting for a build, a long agent run, a break), proactively start a fresh conversation. Your mental model must shift from "1-hour persistence" to "5-minute sprint."

Mentioned in this article

Enjoyed this article?
Share:

Related Articles

More in Products & Launches

View all