Skip to content
gentic.news — AI News Intelligence Platform

Technique · multimodal

CLIP (Contrastive Language-Image Pretraining)

Dual-encoder model trained on 400M image-caption pairs to align image and text embeddings, enabling zero-shot visual classification.

Origin: OpenAI, 2021-02Read origin paper →Also known as: CLIP
1
Products deploying
4y
Avg research → prod
4y
First commercial deploy

Deployment timeline

  1. Llama 4 Scout

    Deployed 2025-04-05 · Velocity 4y

    Multimodal (text+image) capability suggests use of vision-language alignment similar to CLIP.

    medium