Skip to content
gentic.news — AI News Intelligence Platform

Technique · inference

Speculative Decoding

An inference technique where a small draft model proposes tokens and a large model verifies them in parallel, yielding 2-3x speedup without quality loss.

Origin: Google, 2022-11Read origin paper →Also known as: Speculative Sampling, Draft-and-Verify
1
Products deploying
3y
Avg research → prod
3y
First commercial deploy

Deployment timeline

  1. GPT-4o

    Deployed 2026-02-16 · Velocity 3y

    GPT-4o's speed improvements align with inference optimizations like speculative decoding, though not explicitly confirmed.

    low