Skip to content
gentic.news — AI News Intelligence Platform

Technique · inference

PagedAttention (vLLM)

A memory-management scheme for KV cache modeled on OS paging, eliminating fragmentation and enabling high-throughput serving.

Origin: UC Berkeley, 2023-09Read origin paper →Also known as: PagedAttention, Paged KV Cache
1
Products deploying
3y
Avg research → prod
3y
First commercial deploy

Deployment timeline

  1. Mistral Small 4

    Deployed 2026-03-16 · Velocity 3y

    Mistral recommends vLLM for serving, which uses PagedAttention.

    high