Technique · inference

PagedAttention (vLLM)

A memory-management scheme for KV cache modeled on OS paging, eliminating fragmentation and enabling high-throughput serving.

Origin: UC Berkeley, 2023-09Read origin paper →Also known as: PagedAttention, Paged KV Cache

Products deploying

Avg research → prod

First commercial deploy

Deployment timeline

Mistral Small 4
Deployed 2026-03-16 · Velocity 3y
“Mistral recommends vLLM for serving, which uses PagedAttention.”
high