Technique · inference

Speculative Decoding

An inference technique where a small draft model proposes tokens and a large model verifies them in parallel, yielding 2-3x speedup without quality loss.

Origin: Google, 2022-11Read origin paper →Also known as: Speculative Sampling, Draft-and-Verify

Products deploying

Avg research → prod

First commercial deploy

Deployment timeline

GPT-4o
Deployed 2026-02-16 · Velocity 3y
“GPT-4o's speed improvements align with inference optimizations like speculative decoding, though not explicitly confirmed.”
low