Technique · architecture

Mixture of Depths

A technique letting tokens skip transformer layers when unnecessary, allocating compute adaptively based on token importance.

Origin: Google DeepMind, 2024-04Read origin paper →Also known as: MoD

Products deploying

—

Avg research → prod

—

First commercial deploy

Deployment timeline

No verified deployments yet in our tracked product set.