What Changed — Extended Thinking's Response Structure
If you're using Claude Code's API with Extended Thinking enabled, you're getting two distinct content blocks in every response, not one. This isn't a new feature but a critical implementation detail many developers miss:
{
"content": [
{
"type": "thinking",
"thinking": "Let me reason through this step by step...",
"signature": "eyJhbGciOiJFZDI1NTE5..."
},
{
"type": "text",
"text": "The answer is 42."
}
]
}
This follows Anthropic's continued refinement of Claude's reasoning capabilities, part of their broader push into more sophisticated AI tooling that includes recent launches like Claude Code Auto Mode and the multi-agent harness system.
What It Means For Your Claude Code Workflows
Streaming Parsers Need Block Awareness
When streaming responses, you get thinking_delta events first, then text_delta events. If your parser doesn't distinguish between block types, you'll concatenate Claude's internal reasoning into your user-facing output. Many developers accidentally get this right by only reading the last content block—but that's luck, not correct implementation.
Thinking Blocks Are Read-Only
The signature field in thinking blocks is a cryptographic tamper seal. Claude verifies this signature when you send thinking blocks back as conversation history. You cannot edit thinking blocks—if you're building a UI that allows message editing, thinking blocks must be treated as read-only. This prevents developers from modifying Claude's reasoning to steer it in unsafe directions.
Redacted Thinking Happens
Sometimes you'll get:
{
"type": "redacted_thinking",
"data": "encrypted_content_here..."
}
This occurs when Claude's safety systems flag the thinking content. The encrypted data preserves context for follow-up messages without exposing sensitive reasoning. For testing, there's a magic string you can send to force redacted thinking responses—use it to ensure your app doesn't crash.
Try It Now — Practical Configuration
Budget Tokens vs. Max Tokens

Extended Thinking takes a budget_tokens parameter with a minimum of 1,024 tokens. The critical constraint: max_tokens must be greater than budget_tokens. If your thinking budget is 1,024, max_tokens must be at least 1,025—leaving just 1 token for the actual response.
Practical configuration:
{
"thinking": {
"type": "enabled",
"budget_tokens": 1024
},
"max_tokens": 4000
}
This gives 1,024 tokens for thinking and 2,976 tokens for the actual response.
When to Enable Extended Thinking
Anthropic's own advice: Don't reach for Extended Thinking first. Improve your prompts, run evaluations, and only enable thinking when accuracy isn't meeting requirements. Extended Thinking tokens are charged as output tokens, add latency, and come with trade-offs:
- Temperature is locked at 1.0 when Extended Thinking is enabled
- Thinking tokens count toward your total token usage
- For straightforward coding tasks, it's often overhead without benefit
Implementation Checklist for Claude Code Users
- Update streaming parsers to handle
thinking_deltaandtext_deltaseparately - Make thinking blocks read-only in any UI that allows message editing
- Handle
redacted_thinkingblocks gracefully—they will occur - Set
max_tokenssignificantly higher thanbudget_tokens(at least 3:1 ratio) - Test with the magic string to ensure redacted thinking doesn't break your app
- Only enable Extended Thinking for complex reasoning tasks, not routine coding
This structural understanding is essential as Anthropic continues expanding Claude's capabilities, with recent developments including Claude Code Auto Mode for permission decisions and multi-agent systems for complex software engineering tasks.







.webp&w=3840&q=75)