Data poisoning
- Definition
- An attack that corrupts a model's training data to introduce backdoors, biases, or degraded performance. Data poisoning can be targeted (affecting specific outputs) or untargeted (generally degrading model quality).
- Why it matters
- As AI models train on increasingly large and diverse datasets, the attack surface for data poisoning grows. A sophisticated attacker can inject carefully crafted examples into public datasets, web pages, or code repositories that get scraped into training data. The poisoned model then behaves normally except when triggered by specific inputs, making detection extremely difficult. This is not hypothetical: researchers have demonstrated practical data poisoning attacks against production models. For enterprise AI deployments, data poisoning represents a supply chain security risk. You need to trust not just the model vendor, but the entire data pipeline that produced the model.
- In practice
- Researchers at ETH Zurich demonstrated in 2024 that poisoning just 0.01% of a large training dataset could cause targeted model failures, such as always recommending a specific product. The 'Nightshade' tool, developed at the University of Chicago, lets artists add invisible perturbations to their images that corrupt AI models trained on them. Google's research team showed that Wikipedia edits, which flow into many training datasets, could be weaponized for data poisoning. These findings have driven labs to invest in data provenance tracking, training data auditing, and robust training techniques that reduce sensitivity to corrupted samples.
We cover safety & governance every week.
Get the 5 AI stories that matter — free, every Friday.
Related terms
Pre-training data
The massive datasets used to train foundation models during the pre-training phase, typically composed of web crawls, books, academic papers, code repositories, and other text sources. Pre-training data quality and composition directly determine model capabilities.
Jailbreak
A technique for bypassing an AI model's safety guardrails to elicit outputs the model was trained to refuse, such as harmful instructions, restricted content, or system prompt leaks.
Red teaming
The practice of systematically probing an AI system to find vulnerabilities, biases, and failure modes before deployment. Red teaming is now standard practice at major AI labs and increasingly required by regulation.
AI safety
The interdisciplinary field focused on ensuring AI systems behave as intended and do not cause unintended harm. Encompasses alignment research, red teaming, content filtering, and policy advocacy.
Know the terms. Know the moves.
Get the 5 AI stories that matter every Friday — free.
Free forever. No spam.