Technique · architecture

Rotary Position Embedding (RoPE)

A relative-position encoding that rotates query/key vectors in complex space, giving transformers better length extrapolation than absolute sinusoidal embeddings.

Origin: Zhuiyi Technology, 2021-04Read origin paper →Also known as: RoPE, Rotary Embedding

Products deploying

Avg research → prod

First commercial deploy

Deployment timeline

Llama 4 Scout
Deployed 2025-04-05 · Velocity 4y
“Based on Llama 3 architecture which uses RoPE; Llama 4 Scout is a direct evolution.”
high
Llama 4 Maverick
Deployed 2025-04-05 · Velocity 4y
“Llama family models consistently use RoPE. Llama 4 is a direct successor.”
high
Claude Opus 4.6
Deployed 2026-02-16 · Velocity 5y
“Anthropic's research mentions using rotary position embeddings (RoPE) in transformer architectures.”
medium
GPT-4o
Deployed 2026-02-16 · Velocity 5y
“GPT models since GPT-3 use rotary position embeddings (RoPE). GPT-4o's architecture is a direct evolution.”
high
GPT-5
Deployed 2026-02-16 · Velocity 5y
“RoPE is a standard positional encoding used in modern Transformer LLMs, including GPT series.”
high
Claude 3
Deployed 2026-02-18 · Velocity 5y
“Claude 3 uses Rotary Position Embeddings (RoPE) for positional encoding, per technical details.”
high
Gemini 3 Pro
Deployed 2026-02-19 · Velocity 5y
“Gemini models use rotary position embeddings (RoPE).”
high
Gemini 3.1
Deployed 2026-02-20 · Velocity 5y
“Gemini models use Rotary Position Embeddings (RoPE) for position encoding.”
medium
Claude 3.5 Sonnet
Deployed 2026-02-23 · Velocity 5y
“Anthropic's Claude models use rotary position embeddings (RoPE) for position encoding.”
high
Claude Haiku 4.5
Deployed 2026-02-25 · Velocity 5y
“Claude models use rotary position embeddings (RoPE) for positional encoding.”
high
GPT-5.3
Deployed 2026-02-26 · Velocity 5y
“RoPE is a standard position encoding in modern LLMs; GPT-5.3 likely uses it for better length extrapolation.”
medium
Claude 4.5
Deployed 2026-02-26 · Velocity 5y
“Anthropic's Claude models use rotary position embeddings (RoPE) for positional encoding.”
medium
Gemini 3 Flash
Deployed 2026-02-27 · Velocity 5y
“Gemini models use rotary position embeddings (RoPE), as confirmed in the Gemini 1.5 technical report.”
high
GPT-OSS-120B
Deployed 2026-03-02 · Velocity 5y
“As a large language model in the GPT lineage, it almost certainly uses Rotary Position Embedding (RoPE), which is standard in modern transformer architectures.”
medium
Grok 4.20
Deployed 2026-03-02 · Velocity 5y
“Grok models are based on the Transformer architecture, which commonly uses RoPE for position encoding.”
medium
Kimi K2.5
Deployed 2026-03-04 · Velocity 5y
“Most modern LLMs use RoPE for position encoding; Kimi K2.5's long-context capability aligns with this.”
medium
Gemini 3.1 Flash-Lite
Deployed 2026-03-05 · Velocity 5y
“Gemini models use rotary position embeddings (RoPE) for positional encoding.”
high
DeepSeek-V3
Deployed 2026-03-11 · Velocity 5y
“DeepSeek-V3 uses Rotary Position Embedding (RoPE).”
high
Mistral Small 4
Deployed 2026-03-16 · Velocity 5y
“Mistral models use Rotary Position Embeddings (RoPE).”
high
GLM-5.1
Deployed 2026-03-21 · Velocity 5y
“GLM-5.1 uses Rotary Position Embedding (RoPE) for positional encoding.”
high
Qwen 3.6
Deployed 2026-03-31 · Velocity 5y
“Qwen models use Rotary Position Embedding (RoPE) for positional encoding.”
high
GPT-5.4-Cyber
Deployed 2026-04-16 · Velocity 5y
“GPT models use Rotary Position Embeddings (RoPE) for positional encoding.”
medium
Claude Opus 4.7
Deployed 2026-04-16 · Velocity 5y
“Anthropic's Claude 3 model card mentions using rotary position embeddings (RoPE). This is a standard architectural component for their models.”
high

Prior art

Transformer Self-Attention

Techniques built on this

YaRN RoPE Context Extension