Scaling Laws
Scaling laws are empirical relationships that describe how a neural network's performance improves predictably as a power-law function of model size (number of parameters), training dataset size, and compute budget. Pioneered by Kaplan et al. (2020) and refined by the Chinchilla paper (Hoffmann et al., 2022), these laws let researchers forecast model quality before training, enabling rational decisions about how to allocate a fixed compute budget between a larger model vs. more training tokens. Understanding scaling laws is now a prerequisite for planning and evaluating large-scale AI training runs.
Every frontier AI lab — OpenAI, Google DeepMind, Anthropic, Meta — uses scaling law analysis to decide where to invest hundreds of millions of dollars in training compute, making it a core competency for ML researchers and infrastructure engineers. As companies grapple with diminishing pre-training returns and the rise of inference-time scaling, the ability to interpret and apply updated scaling laws (e.g., Chinchilla, Densing Law, inference-aware variants) is increasingly valued in hiring. Practitioners who can run scaling experiments, fit power-law curves, and derive compute-optimal training configurations are rare and highly sought after.
🎓 Courses
Deep Learning Specialization
by Andrew Ng
Builds the deep learning foundations required to reason about scaling: neural network architecture, hyperparameter tuning, and optimization. Scaling laws concepts are introduced in context of why bigger models trained with more data consistently outperform.
MIT 6.S191 Introduction to Deep Learning — Scaling Lecture
by MIT EECS faculty
Covers the motivation for scaling, empirical scaling laws from Kaplan et al. and LLaMA, and practical techniques (data/tensor/pipeline parallelism, MoE sparsity) for actually training at scale. Free and updated annually.
Stanford CS224N: Natural Language Processing with Deep Learning (2024)
by Christopher Manning and team
The most rigorous academic NLP course; lectures directly address how scaling laws shaped the LLM era, covering Kaplan and Chinchilla in context of real model development decisions. Full 2024 lecture videos are freely available on YouTube.
Understanding the Origins and Taxonomy of Neural Scaling Laws
by Yasaman Bahri (Stanford / Google)
A focused technical lecture specifically on the theoretical grounding and taxonomy of scaling laws — rare content that goes beyond empirical recipes into why power laws arise in neural networks. Ideal for researchers.
📖 Books
Deep Learning at Scale
Sumanth Doddapaneni et al. · 2024
Published June 2024 by O'Reilly, this 448-page book devotes multiple chapters to the philosophy and history of scaling laws, compute-optimal training, and data-centric scaling. It is the most current practitioner-focused book that treats scaling laws as a first-class topic alongside distributed training.
🛠️ Tutorials & Guides
Chinchilla data-optimal scaling laws: In plain English
The clearest plain-English walkthrough of the Chinchilla paper's methodology and conclusions, including the compute-optimal token-to-parameter ratio. A go-to reference for practitioners who need to apply Chinchilla without a research background.
Beyond Bigger Models: The Evolution of Language Model Scaling Laws
A 2024 overview that traces the progression from Kaplan through Chinchilla to inference-aware and efficiency-focused variants (Sardana, Densing Law), giving practitioners a map of where the field stands and where it is heading.
LLM Scaling Laws: Analysis from AI Researchers
A structured 2025 overview covering multiple scaling law frameworks including the Densing Law and MoE-specific laws, with comparisons useful for applied ML engineers deciding on training configurations.
Learning resources last updated: June 18, 2026