LLM Architecture Gallery Compiles 38 Model Designs from 2024-2026 with Diagrams and Code
AI ResearchScore: 85

LLM Architecture Gallery Compiles 38 Model Designs from 2024-2026 with Diagrams and Code

A new open-source repository provides annotated architecture diagrams, key design choices, and code implementations for 38 major LLMs released between 2024 and 2026, including DeepSeek V3, Qwen3 variants, and GLM-5 744B.

4h ago·2 min read·4 views·via @akshay_pachaar
Share:

What Happened

A new public repository called the "LLM Architecture Gallery" has been released, compiling technical documentation for 38 distinct large language model architectures launched between 2024 and 2026. The gallery, highlighted by AI engineer Akshay Pachaar, was created by Sebastian Raschka (@rasbt) with contributions from Pachaar during his time at Lightning AI.

The repository serves as a centralized technical reference, providing three core elements for each model:

  • An annotated architecture diagram visualizing the model's structure.
  • A breakdown of key design choices (e.g., attention mechanisms, normalization layers, activation functions).
  • A code implementation, likely in PyTorch, demonstrating the architecture.

The Models Covered

The gallery spans a wide range of models from major AI labs and companies, focusing on releases from the last two years. The list of 38 models includes:

  • Open-Source Foundation Models: Llama 3 8B, OLMo 2/3 variants (7B, 32B), Gemma 3 27B, Mistral Small 3.1 24B, SmolLM3 3B, GPT-OSS (20B, 120B).
  • Recent High-Performance Models: DeepSeek V3, DeepSeek V3.2, DeepSeek R1, Qwen3 series (4B to 235B-A22B), Qwen3.5 397B, GLM-4.5/4.7/5 (up to 744B).
  • Proprietary & Regional Models: Grok 2.5 270B, Kimi K2, Kimi Linear 48B-A3B, Xiaomi MiMo-V2-Flash 309B, Arcee AI Trinity Large 400B, Sarvam AI models (30B, 105B).
  • Announced/Upcoming Architectures: Models like Llama 4 Maverick and Nemotron 3 Super 120B-A12B, which represent published architectures from this period.

The repository is hosted on GitHub, accessible via the link provided in the source: https://github.com/rasbt/LLM-architecture-gallery.

Context

This project addresses a growing pain point in the fast-moving field of LLM research and development: fragmented and inconsistent documentation. While major models are often accompanied by academic papers or blog posts, the exact architectural details, layer configurations, and implementation nuances can be difficult to find, compare, or reproduce. A standardized gallery allows engineers and researchers to quickly understand design trends, compare architectural choices (like the use of grouped-query attention vs. multi-head attention), and have a verified code reference for experimentation or educational purposes.

The period from 2024 to 2026 has seen rapid architectural innovation beyond the now-standard Transformer, with models experimenting with mixture-of-experts (MoE) configurations (e.g., DeepSeek V3), new attention variants, and alternative topologies. Having these designs cataloged in one place provides a valuable snapshot of this evolutionary phase.

AI Analysis

The LLM Architecture Gallery is a utility, not a research breakthrough, but its value to practitioners is direct and significant. For engineers building on top of or fine-tuning these models, having a clear, code-backed diagram eliminates hours of digging through papers and source code to answer basic questions about tensor shapes, normalization placement, or residual connections. It turns architecture from a descriptive concept into an executable specification. From a research perspective, the gallery enables rapid comparative analysis. One could systematically track the adoption rate of techniques like RMSNorm, SwiGLU activations, or rotary positional embeddings (RoPE) across this model set. It makes the "design space" of modern LLMs tangible. The inclusion of models like Xiaomi's MiMo and Sarvam's offerings also provides easier visibility into architectural trends emerging from labs outside the US. The main limitation is maintenance. The field moves quickly, and new variants emerge constantly. The gallery's usefulness will depend on its ability to stay current. Furthermore, while the code provides the skeleton, it may not include the exact hyperparameters, tokenizer details, or training data pipeline, which are equally critical for full replication. Nonetheless, as a starting point for understanding and implementing these architectures, it sets a new standard for open technical documentation in the community.
Original sourcex.com

Trending Now