What Happened
A new public repository called the "LLM Architecture Gallery" has been released, compiling technical documentation for 38 distinct large language model architectures launched between 2024 and 2026. The gallery, highlighted by AI engineer Akshay Pachaar, was created by Sebastian Raschka (@rasbt) with contributions from Pachaar during his time at Lightning AI.
The repository serves as a centralized technical reference, providing three core elements for each model:
- An annotated architecture diagram visualizing the model's structure.
- A breakdown of key design choices (e.g., attention mechanisms, normalization layers, activation functions).
- A code implementation, likely in PyTorch, demonstrating the architecture.
The Models Covered
The gallery spans a wide range of models from major AI labs and companies, focusing on releases from the last two years. The list of 38 models includes:
- Open-Source Foundation Models: Llama 3 8B, OLMo 2/3 variants (7B, 32B), Gemma 3 27B, Mistral Small 3.1 24B, SmolLM3 3B, GPT-OSS (20B, 120B).
- Recent High-Performance Models: DeepSeek V3, DeepSeek V3.2, DeepSeek R1, Qwen3 series (4B to 235B-A22B), Qwen3.5 397B, GLM-4.5/4.7/5 (up to 744B).
- Proprietary & Regional Models: Grok 2.5 270B, Kimi K2, Kimi Linear 48B-A3B, Xiaomi MiMo-V2-Flash 309B, Arcee AI Trinity Large 400B, Sarvam AI models (30B, 105B).
- Announced/Upcoming Architectures: Models like Llama 4 Maverick and Nemotron 3 Super 120B-A12B, which represent published architectures from this period.
The repository is hosted on GitHub, accessible via the link provided in the source: https://github.com/rasbt/LLM-architecture-gallery.
Context
This project addresses a growing pain point in the fast-moving field of LLM research and development: fragmented and inconsistent documentation. While major models are often accompanied by academic papers or blog posts, the exact architectural details, layer configurations, and implementation nuances can be difficult to find, compare, or reproduce. A standardized gallery allows engineers and researchers to quickly understand design trends, compare architectural choices (like the use of grouped-query attention vs. multi-head attention), and have a verified code reference for experimentation or educational purposes.
The period from 2024 to 2026 has seen rapid architectural innovation beyond the now-standard Transformer, with models experimenting with mixture-of-experts (MoE) configurations (e.g., DeepSeek V3), new attention variants, and alternative topologies. Having these designs cataloged in one place provides a valuable snapshot of this evolutionary phase.


