Scaling laws
- Definition
- Empirical relationships showing that model performance improves predictably as you increase data, compute, and parameters. Scaling laws are why labs are pouring billions into ever-larger training runs.
- Why it matters
- Scaling laws are the most important empirical finding in modern AI. They show that model quality improves as a smooth, predictable function of compute investment, with no signs of plateauing (so far). This predictability is what justifies billion-dollar training runs: if you can reliably predict that 10x more compute yields a measurably better model, the investment becomes an engineering problem rather than a research gamble. But scaling laws also have limits: they predict benchmark improvements, not real-world utility. A model that scores 5% higher on benchmarks may not be 5% more useful in production. The big strategic question is whether scaling laws continue to hold, or whether we are approaching diminishing returns.
- In practice
- The Chinchilla paper (Hoffmann et al., 2022) from DeepMind established that optimal training requires scaling data and parameters together, not just parameters. This finding redirected billions in industry investment: instead of training ever-larger models on fixed datasets, labs began investing equally in data curation. Kaplan et al.'s original scaling laws (2020) from OpenAI showed power-law relationships between compute and loss. In practice, labs use scaling laws to predict the performance of large training runs by extrapolating from smaller ones, saving millions in failed experiments. The debate over whether scaling laws are plateauing or will continue for another decade is the most consequential disagreement in AI strategy.
We cover models & architecture every week.
Get the 5 AI stories that matter — free, every Friday.
Related terms
Pre-training
The initial phase of model training where the network learns general knowledge from a massive dataset. Pre-training is the most expensive phase, often costing tens or hundreds of millions of dollars for frontier models.
Compute overhang
A situation where available compute capacity grows faster than the algorithmic improvements needed to use it, creating a stockpile of unused potential. A sudden algorithmic breakthrough can then unlock rapid capability jumps.
Frontier model
The most capable AI model available at any given time, representing the current state of the art. Frontier models push the boundaries of what AI can do and are typically the most expensive to train and run.
Foundation model
A large, general-purpose model pre-trained on broad data that can be adapted to many downstream tasks. GPT-4, Claude, Gemini, and Llama are all foundation models. The term signals massive upfront investment and wide applicability.
Know the terms. Know the moves.
Get the 5 AI stories that matter every Friday — free.
Free forever. No spam.