Technique · inference
Speculative Decoding
An inference technique where a small draft model proposes tokens and a large model verifies them in parallel, yielding 2-3x speedup without quality loss.
1
Products deploying
3y
Avg research → prod
3y
First commercial deploy
Deployment timeline
- GPT-4olow
Deployed 2026-02-16 · Velocity 3y
“GPT-4o's speed improvements align with inference optimizations like speculative decoding, though not explicitly confirmed.”