Training
- Definition
- The process of teaching a neural network by feeding it data and adjusting its parameters to minimize prediction errors. Training frontier models now costs $100M+ and takes months on thousands of GPUs.
- Why it matters
- Training is the most capital-intensive activity in AI and the primary barrier to entry for new model providers. Understanding training economics helps you evaluate which organizations can credibly compete at the frontier and which are dependent on models trained by others. Training costs are determined by three factors: data volume, model size, and hardware costs. The training compute budget for frontier models has been doubling approximately every 6-10 months, from $5M (GPT-3) to $100M+ (GPT-4) to projected $1B+ (next-generation models). This exponential growth in training costs is creating an oligopoly of organizations that can afford frontier training runs.
- In practice
- Meta's Llama 3 405B was trained on 16,000 H100 GPUs using 15.6 trillion tokens, a process that took several months and cost an estimated $100-200M in compute alone. Anthropic's Claude and Google's Gemini involved similar-scale training runs. Training infrastructure involves not just GPUs but also high-bandwidth networking (InfiniBand or RoCE), distributed training frameworks (PyTorch FSDP, Megatron-LM), and custom data pipelines. The operational challenges are immense: hardware failures, training instabilities, checkpoint management, and cluster utilization optimization all require dedicated infrastructure teams. A single GPU failure during a large training run can waste millions in compute.
We cover data & training every week.
Get the 5 AI stories that matter — free, every Friday.
Related terms
Pre-training
The initial phase of model training where the network learns general knowledge from a massive dataset. Pre-training is the most expensive phase, often costing tens or hundreds of millions of dollars for frontier models.
GPU (Graphics Processing Unit)
The hardware chip that powers AI training and inference. NVIDIA's H100 and B200 GPUs are the most sought-after compute in the industry, with wait times and pricing driving major strategic decisions.
Parameter
A learnable value inside a neural network that gets adjusted during training. Model size is measured in parameters (e.g., 70B, 405B), which roughly correlates with capability and cost.
Scaling laws
Empirical relationships showing that model performance improves predictably as you increase data, compute, and parameters. Scaling laws are why labs are pouring billions into ever-larger training runs.
Know the terms. Know the moves.
Get the 5 AI stories that matter every Friday — free.
Free forever. No spam.