Recipe ·

Gemini 3.1 Flash-Lite

Google's Gemini 3.1 Flash-Lite is a cost-optimized AI model designed for high-volume production workloads, offering a balance of speed and efficiency for developers.

Techniques inside

Median research → prod

Fastest adoption

Slowest adoption

Ingredient list

#1YaRN RoPE Context Extension
Invented by Nous Research · 2023-08 · Velocity 3y
“Gemini 1.5 models feature a 1 million token context window, achieved via novel research on efficient attention and positional encoding.”
architecturemedium
#2Chain-of-Thought Prompting
Invented by Google · 2022-01 · Velocity 4y
“Gemini models are trained to reason step-by-step, and Gemini 1.5 Flash and Pro can be prompted to show their reasoning.”
reasoninghigh
#3Instruction Tuning (FLAN)
Invented by Google · 2021-09 · Velocity 5y
“Gemini models are instruction-tuned, building on the FLAN instruction-tuning methodology.”
traininghigh
#4Rotary Position Embedding (RoPE)
Invented by Zhuiyi Technology · 2021-04 · Velocity 5y
“Gemini models use rotary position embeddings (RoPE) for positional encoding.”
architecturehigh
#5Transformer Self-Attention
Invented by Google · 2017-06 · Velocity 9y
“Gemini models are based on the Transformer decoder architecture.”
architecturehigh
#6Mixture of Experts (Sparse MoE for LLMs)
Invented by Google · 2017-01 · Velocity 9y
“Gemini 1.5 Pro uses a Mixture-of-Experts (MoE) architecture. Flash-Lite is a distilled version of the larger MoE models.”
architecturehigh

This recipe is part of the gentic.news Deployment Atlas. Every ingredient has an origin paper + evidence. Methodology is public. Dataset is CC BY 4.0.