Skip to content
gentic.news — AI News Intelligence Platform

Technique · multimodal

LLaVA (Visual Instruction Tuning)

Projecting CLIP features into an LLM's token space via a simple projector + instruction tuning on GPT-4-generated visual conversations.

Origin: University of Wisconsin, 2023-04Read origin paper →Also known as: LLaVA
3
Products deploying
3y
Avg research → prod
2.0y
First commercial deploy

Deployment timeline

  1. Llama 4 Scout

    Deployed 2025-04-05 · Velocity 2.0y

    Natively multimodal (text+image) open-weight model, similar to LLaVA's approach of projecting vision features into LLM.

    medium
  2. Kimi K2.5

    Deployed 2026-03-04 · Velocity 3y

    Kimi K2.5 is a multimodal model with vision capabilities, similar to LLaVA's approach of projecting visual features into LLM token space.

    medium
  3. Qwen 3.6

    Deployed 2026-03-31 · Velocity 3y

    Qwen 3.6 includes a multimodal version (Qwen-VL) that uses a vision encoder and projector.

    high