Data & TrainingDeep Dive

Catastrophic forgetting

Definition
When a model loses previously learned knowledge after being trained on new data. Catastrophic forgetting is a key challenge in continual learning and one reason fine-tuning must be done carefully.
Why it matters
If you fine-tune a foundation model on your proprietary data and it forgets how to write coherent English, you have wasted your training budget. Catastrophic forgetting is the practical constraint that makes fine-tuning an art, not a science. You need enough new data to shift behavior but not so much that you overwrite general capabilities. This trade-off explains why techniques like LoRA (which freeze most parameters) and careful learning rate schedules exist. For teams building custom models, understanding forgetting risk is the difference between a successful deployment and an expensive failure.
In practice
In continual pre-training experiments, researchers at Google showed that training Gemini on new domain data without replay caused a 5-15% degradation on general benchmarks. LoRA mitigates forgetting by only updating a small number of parameters, preserving the base model's knowledge. Elastic Weight Consolidation, originally developed by DeepMind, penalizes changes to parameters that were important for previous tasks. In practice, most enterprises use LoRA or QLoRA for fine-tuning specifically because full fine-tuning's forgetting risk is too high for models they cannot afford to break.

We cover data & training every week.

Get the 5 AI stories that matter — free, every Friday.

Know the terms. Know the moves.

Get the 5 AI stories that matter every Friday — free.

Free forever. No spam.