Technique · inference
PagedAttention (vLLM)
A memory-management scheme for KV cache modeled on OS paging, eliminating fragmentation and enabling high-throughput serving.
1
Products deploying
3y
Avg research → prod
3y
First commercial deploy
Deployment timeline
- Mistral Small 4high
Deployed 2026-03-16 · Velocity 3y
“Mistral recommends vLLM for serving, which uses PagedAttention.”