Catastrophic forgetting
- Definition
- When a model loses previously learned knowledge after being trained on new data. Catastrophic forgetting is a key challenge in continual learning and one reason fine-tuning must be done carefully.
- Why it matters
- If you fine-tune a foundation model on your proprietary data and it forgets how to write coherent English, you have wasted your training budget. Catastrophic forgetting is the practical constraint that makes fine-tuning an art, not a science. You need enough new data to shift behavior but not so much that you overwrite general capabilities. This trade-off explains why techniques like LoRA (which freeze most parameters) and careful learning rate schedules exist. For teams building custom models, understanding forgetting risk is the difference between a successful deployment and an expensive failure.
- In practice
- In continual pre-training experiments, researchers at Google showed that training Gemini on new domain data without replay caused a 5-15% degradation on general benchmarks. LoRA mitigates forgetting by only updating a small number of parameters, preserving the base model's knowledge. Elastic Weight Consolidation, originally developed by DeepMind, penalizes changes to parameters that were important for previous tasks. In practice, most enterprises use LoRA or QLoRA for fine-tuning specifically because full fine-tuning's forgetting risk is too high for models they cannot afford to break.
We cover data & training every week.
Get the 5 AI stories that matter — free, every Friday.
Related terms
Fine-tuning
The process of continuing to train a pre-trained model on a smaller, task-specific dataset. Fine-tuning customizes model behavior for specific domains or formats and is a key part of most enterprise AI deployments.
LoRA (Low-Rank Adaptation)
A parameter-efficient fine-tuning technique that injects small trainable matrices into a frozen model. LoRA lets companies customize large models at a fraction of the cost of full fine-tuning.
Pre-training
The initial phase of model training where the network learns general knowledge from a massive dataset. Pre-training is the most expensive phase, often costing tens or hundreds of millions of dollars for frontier models.
Transfer learning
Using knowledge learned from one task or dataset to improve performance on a different but related task. Transfer learning is why pre-trained foundation models can be fine-tuned for specialized applications.
Know the terms. Know the moves.
Get the 5 AI stories that matter every Friday — free.
Free forever. No spam.