Question 1

What is Benchmark gaming?

Accepted Answer

The practice of optimizing a model's performance on specific benchmarks without corresponding improvements in general capability, either through targeted training data, prompt engineering, or architectural shortcuts.

Question 2

Why does Benchmark gaming matter for business?

Accepted Answer

Benchmark gaming undermines the entire evaluation ecosystem. When labs optimize for benchmarks rather than real-world utility, buyers make decisions based on misleading numbers. The most common forms are training on benchmark test data (contamination), cherry-picking evaluation conditions, and reporting only favorable benchmarks. For enterprise buyers, this means you cannot trust headline numbers in model announcements. You need to run your own evals on your own data. The companies that invest in custom evaluation pipelines gain a massive advantage over those that rely on vendor-reported benchmarks for model selection.

Benchmark gaming

Related terms

Know the terms. Know the moves.