Extended Thinking's Two-Block Response: What Claude Code Users Need to Know

Extended Thinking returns separate thinking and text blocks - handle them correctly in streaming or your UI will show raw reasoning.

Ggentic.news Editorial·2d ago·3 min read·4 views

Source: dev.tovia devto_anthropicSingle Source

What Changed — Extended Thinking's Response Structure

If you're using Claude Code's API with Extended Thinking enabled, you're getting two distinct content blocks in every response, not one. This isn't a new feature but a critical implementation detail many developers miss:

{
  "content": [
    {
      "type": "thinking",
      "thinking": "Let me reason through this step by step...",
      "signature": "eyJhbGciOiJFZDI1NTE5..."
    },
    {
      "type": "text",
      "text": "The answer is 42."
    }
  ]
}

This follows Anthropic's continued refinement of Claude's reasoning capabilities, part of their broader push into more sophisticated AI tooling that includes recent launches like Claude Code Auto Mode and the multi-agent harness system.

What It Means For Your Claude Code Workflows

Streaming Parsers Need Block Awareness

When streaming responses, you get thinking_delta events first, then text_delta events. If your parser doesn't distinguish between block types, you'll concatenate Claude's internal reasoning into your user-facing output. Many developers accidentally get this right by only reading the last content block—but that's luck, not correct implementation.

Thinking Blocks Are Read-Only

The signature field in thinking blocks is a cryptographic tamper seal. Claude verifies this signature when you send thinking blocks back as conversation history. You cannot edit thinking blocks—if you're building a UI that allows message editing, thinking blocks must be treated as read-only. This prevents developers from modifying Claude's reasoning to steer it in unsafe directions.

Redacted Thinking Happens

Sometimes you'll get:

{
  "type": "redacted_thinking",
  "data": "encrypted_content_here..."
}

This occurs when Claude's safety systems flag the thinking content. The encrypted data preserves context for follow-up messages without exposing sensitive reasoning. For testing, there's a magic string you can send to force redacted thinking responses—use it to ensure your app doesn't crash.

Try It Now — Practical Configuration

Budget Tokens vs. Max Tokens

Cover image for Extended Thinking Returns Two Blocks, Not One (Anthropic Academy Part 2)

Extended Thinking takes a budget_tokens parameter with a minimum of 1,024 tokens. The critical constraint: max_tokens must be greater than budget_tokens. If your thinking budget is 1,024, max_tokens must be at least 1,025—leaving just 1 token for the actual response.

Practical configuration:

{
  "thinking": {
    "type": "enabled",
    "budget_tokens": 1024
  },
  "max_tokens": 4000
}

This gives 1,024 tokens for thinking and 2,976 tokens for the actual response.

When to Enable Extended Thinking

Anthropic's own advice: Don't reach for Extended Thinking first. Improve your prompts, run evaluations, and only enable thinking when accuracy isn't meeting requirements. Extended Thinking tokens are charged as output tokens, add latency, and come with trade-offs:

Temperature is locked at 1.0 when Extended Thinking is enabled
Thinking tokens count toward your total token usage
For straightforward coding tasks, it's often overhead without benefit

Implementation Checklist for Claude Code Users

Update streaming parsers to handle thinking_delta and text_delta separately
Make thinking blocks read-only in any UI that allows message editing
Handle redacted_thinking blocks gracefully—they will occur
Set max_tokens significantly higher than budget_tokens (at least 3:1 ratio)
Test with the magic string to ensure redacted thinking doesn't break your app
Only enable Extended Thinking for complex reasoning tasks, not routine coding

This structural understanding is essential as Anthropic continues expanding Claude's capabilities, with recent developments including Claude Code Auto Mode for permission decisions and multi-agent systems for complex software engineering tasks.

AI Analysis

Claude Code users should immediately check their streaming implementations. If you're using the API with Extended Thinking enabled, verify your parser distinguishes between `thinking_delta` and `text_delta` events. Otherwise, users might see Claude's raw reasoning in your interface. Update your Claude Code configurations: when using Extended Thinking, set `max_tokens` to at least 3,000-4,000 with a `budget_tokens` of 1,024. This ensures adequate response length after thinking completes. Remember that temperature is locked at 1.0 when Extended Thinking is active—factor this into your prompt design for coding tasks where consistency matters. For testing, implement handling for `redacted_thinking` blocks. These will occur in production when Claude's safety systems trigger, and your app shouldn't crash. Consider adding a development flag to force redacted thinking responses during testing using Anthropic's documented magic string.

#best-practices #api #configuration #streaming

Enjoyed this article?

Get the weekly AI intelligence briefing

Products & Launches3 shared topics

Claude Code's /voice Mode: The Hybrid Workflow That Actually Works

Products & Launches3 shared topics

Anthropic Launches Claude Code Auto Mode: AI Can Now Make Permission Decisions During Code Execution

Products & Launches2 shared topics

Cowork Hardcodes 'Medium' Effort for Opus 4.6, Ignoring Your Settings

Products & Launches2 shared topics

Anthropic's 'Auto-dream' Feature for Claude Code Automatically Compacts and Indexes Project Memory

Products & Launches2 shared topics

Claude Code Introduces Interactive /init Command to Automate Project Configuration

Products & Launches2 shared topics

Extended Thinking's Two-Block Response: What Claude Code Users Need to Know

What Changed — Extended Thinking's Response Structure

What It Means For Your Claude Code Workflows

Streaming Parsers Need Block Awareness

Thinking Blocks Are Read-Only

Redacted Thinking Happens

Try It Now — Practical Configuration

Budget Tokens vs. Max Tokens

When to Enable Extended Thinking

Implementation Checklist for Claude Code Users

AI Analysis

Related Articles

Claude Code's /voice Mode: The Hybrid Workflow That Actually Works

Anthropic Launches Claude Code Auto Mode: AI Can Now Make Permission Decisions During Code Execution

Cowork Hardcodes 'Medium' Effort for Opus 4.6, Ignoring Your Settings

Anthropic's 'Auto-dream' Feature for Claude Code Automatically Compacts and Indexes Project Memory

Claude Code Introduces Interactive /init Command to Automate Project Configuration

How to Build a Real Project on Claude Code's Free Plan (Without the Pain)

More in AI Research

VHS: Latent Verifier Cuts Diffusion Model Verification Cost by 63.3%, Boosts GenEval by 2.7%

Benchmark: Hierarchical Multi-Agent LLM Architecture Hits F1 0.921 at 1.4x Cost for Financial Document Extraction

Ego2Web Benchmark Bridges Egocentric Video and Web Agents, Exposing Major Performance Gaps