Skip to content
gentic.news — AI News Intelligence Platform

Technique · reasoning

Test-Time Compute Scaling

Allocating more compute at inference (longer reasoning chains, multiple samples + verifier) can outperform scaling parameters — the basis for o1-style reasoning models.

Origin: Google DeepMind, 2024-08Read origin paper →Also known as: Inference-time compute, Thinking tokens
5
Products deploying
1.7y
Avg research → prod
1.6y
First commercial deploy

Deployment timeline

  1. DeepSeek-R1

    Deployed 2026-03-17 · Velocity 1.6y

    Employs iterative refinement and multiple reasoning samples at inference time.

    high
  2. Claude 3.5 Opus

    Deployed 2026-03-18 · Velocity 1.6y

    Claude 3.5 Opus uses longer reasoning chains for complex problems, allocating more inference compute.

    medium
  3. GPT-Rosalind

    Deployed 2026-04-16 · Velocity 1.7y

    The model uses test-time compute scaling via multiple sampled reasoning paths and majority voting.

    high
  4. Claude Opus 4.7

    Deployed 2026-04-16 · Velocity 1.7y

    The product description highlights 'xhigh thinking effort', which is Anthropic's terminology for allocating more compute for longer, more thorough reasoning chains at inference time.

    high
  5. Kimi K2.6

    Deployed 2026-04-20 · Velocity 1.7y

    Supports up to 13h continuous reasoning for long-horizon tasks, allocating substantial compute at inference time.

    high
Test-Time Compute Scaling — Origin, Deployments, and Velocity | gentic.news