Recipe ·

GLM-5.1

GLM-5.1, developed by Zhipu AI, is a next-generation foundation model featuring a 1 million token context window and support for 128K output tokens.

Techniques inside

Median research → prod

Fastest adoption

Slowest adoption

Ingredient list

#1YaRN RoPE Context Extension
Invented by Nous Research · 2023-08 · Velocity 3y
“GLM-5.1 extends context length to 1M tokens using YaRN (Yet another RoPE extensioN) method.”
architecturehigh
#2Grouped-Query Attention (GQA)
Invented by Google · 2023-05 · Velocity 3y
“GLM-5.1 architecture uses Grouped-Query Attention (GQA) to reduce KV cache memory.”
architecturehigh
#3FlashAttention
Invented by Stanford · 2022-05 · Velocity 4y
“GLM-5.1 implements FlashAttention-2 for efficient attention computation.”
inferencehigh
#4Self-Consistency
Invented by Google · 2022-03 · Velocity 4y
“GLM-5.1 can use self-consistency by sampling multiple reasoning paths.”
reasoningmedium
#5Chain-of-Thought Prompting
Invented by Google · 2022-01 · Velocity 4y
“GLM-5.1 demonstrates chain-of-thought reasoning capabilities in examples.”
reasoningmedium
#6Instruction Tuning (FLAN)
Invented by Google · 2021-09 · Velocity 5y
“GLM-5.1 is instruction-tuned on diverse tasks following FLAN methodology.”
trainingmedium
#7Rotary Position Embedding (RoPE)
Invented by Zhuiyi Technology · 2021-04 · Velocity 5y
“GLM-5.1 uses Rotary Position Embedding (RoPE) for positional encoding.”
architecturehigh

This recipe is part of the gentic.news Deployment Atlas. Every ingredient has an origin paper + evidence. Methodology is public. Dataset is CC BY 4.0.