Extended Thinking's Two-Block Response: What Claude Code Users Need to Know
AI ResearchScore: 78

Extended Thinking's Two-Block Response: What Claude Code Users Need to Know

Extended Thinking returns separate thinking and text blocks - handle them correctly in streaming or your UI will show raw reasoning.

Ggentic.news Editorial·2d ago·3 min read·4 views
Share:
Source: dev.tovia devto_anthropicSingle Source

What Changed — Extended Thinking's Response Structure

If you're using Claude Code's API with Extended Thinking enabled, you're getting two distinct content blocks in every response, not one. This isn't a new feature but a critical implementation detail many developers miss:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me reason through this step by step...",
      "signature": "eyJhbGciOiJFZDI1NTE5..."
    },
    {
      "type": "text",
      "text": "The answer is 42."
    }
  ]
}

This follows Anthropic's continued refinement of Claude's reasoning capabilities, part of their broader push into more sophisticated AI tooling that includes recent launches like Claude Code Auto Mode and the multi-agent harness system.

What It Means For Your Claude Code Workflows

Streaming Parsers Need Block Awareness

When streaming responses, you get thinking_delta events first, then text_delta events. If your parser doesn't distinguish between block types, you'll concatenate Claude's internal reasoning into your user-facing output. Many developers accidentally get this right by only reading the last content block—but that's luck, not correct implementation.

Thinking Blocks Are Read-Only

The signature field in thinking blocks is a cryptographic tamper seal. Claude verifies this signature when you send thinking blocks back as conversation history. You cannot edit thinking blocks—if you're building a UI that allows message editing, thinking blocks must be treated as read-only. This prevents developers from modifying Claude's reasoning to steer it in unsafe directions.

Redacted Thinking Happens

Sometimes you'll get:

{
  "type": "redacted_thinking",
  "data": "encrypted_content_here..."
}

This occurs when Claude's safety systems flag the thinking content. The encrypted data preserves context for follow-up messages without exposing sensitive reasoning. For testing, there's a magic string you can send to force redacted thinking responses—use it to ensure your app doesn't crash.

Try It Now — Practical Configuration

Budget Tokens vs. Max Tokens

Cover image for Extended Thinking Returns Two Blocks, Not One (Anthropic Academy Part 2)

Extended Thinking takes a budget_tokens parameter with a minimum of 1,024 tokens. The critical constraint: max_tokens must be greater than budget_tokens. If your thinking budget is 1,024, max_tokens must be at least 1,025—leaving just 1 token for the actual response.

Practical configuration:

{
  "thinking": {
    "type": "enabled",
    "budget_tokens": 1024
  },
  "max_tokens": 4000
}

This gives 1,024 tokens for thinking and 2,976 tokens for the actual response.

When to Enable Extended Thinking

Anthropic's own advice: Don't reach for Extended Thinking first. Improve your prompts, run evaluations, and only enable thinking when accuracy isn't meeting requirements. Extended Thinking tokens are charged as output tokens, add latency, and come with trade-offs:

  • Temperature is locked at 1.0 when Extended Thinking is enabled
  • Thinking tokens count toward your total token usage
  • For straightforward coding tasks, it's often overhead without benefit

Implementation Checklist for Claude Code Users

  1. Update streaming parsers to handle thinking_delta and text_delta separately
  2. Make thinking blocks read-only in any UI that allows message editing
  3. Handle redacted_thinking blocks gracefully—they will occur
  4. Set max_tokens significantly higher than budget_tokens (at least 3:1 ratio)
  5. Test with the magic string to ensure redacted thinking doesn't break your app
  6. Only enable Extended Thinking for complex reasoning tasks, not routine coding

This structural understanding is essential as Anthropic continues expanding Claude's capabilities, with recent developments including Claude Code Auto Mode for permission decisions and multi-agent systems for complex software engineering tasks.

AI Analysis

Claude Code users should immediately check their streaming implementations. If you're using the API with Extended Thinking enabled, verify your parser distinguishes between `thinking_delta` and `text_delta` events. Otherwise, users might see Claude's raw reasoning in your interface. Update your Claude Code configurations: when using Extended Thinking, set `max_tokens` to at least 3,000-4,000 with a `budget_tokens` of 1,024. This ensures adequate response length after thinking completes. Remember that temperature is locked at 1.0 when Extended Thinking is active—factor this into your prompt design for coding tasks where consistency matters. For testing, implement handling for `redacted_thinking` blocks. These will occur in production when Claude's safety systems trigger, and your app shouldn't crash. Consider adding a development flag to force redacted thinking responses during testing using Anthropic's documented magic string.
Enjoyed this article?
Share:

Related Articles

More in AI Research

View all