Technique · reasoning

Test-Time Compute Scaling

Allocating more compute at inference (longer reasoning chains, multiple samples + verifier) can outperform scaling parameters — the basis for o1-style reasoning models.

Origin: Google DeepMind, 2024-08Read origin paper →Also known as: Inference-time compute, Thinking tokens

Products deploying

1.7y

Avg research → prod

1.6y

First commercial deploy

Deployment timeline

DeepSeek-R1
Deployed 2026-03-17 · Velocity 1.6y
“Employs iterative refinement and multiple reasoning samples at inference time.”
high
Claude 3.5 Opus
Deployed 2026-03-18 · Velocity 1.6y
“Claude 3.5 Opus uses longer reasoning chains for complex problems, allocating more inference compute.”
medium
GPT-Rosalind
Deployed 2026-04-16 · Velocity 1.7y
“The model uses test-time compute scaling via multiple sampled reasoning paths and majority voting.”
high
Claude Opus 4.7
Deployed 2026-04-16 · Velocity 1.7y
“The product description highlights 'xhigh thinking effort', which is Anthropic's terminology for allocating more compute for longer, more thorough reasoning chains at inference time.”
high
Kimi K2.6
Deployed 2026-04-20 · Velocity 1.7y
“Supports up to 13h continuous reasoning for long-horizon tasks, allocating substantial compute at inference time.”
high

Deployment timeline

Prior art