Test-time compute
- Definition
- The practice of allocating additional compute during inference to improve output quality, rather than relying solely on the capabilities baked in during training. Reasoning models and extended thinking are the primary examples of test-time compute scaling.
- Why it matters
- Test-time compute introduces a second scaling axis for AI capability. The first axis is training compute: bigger models trained on more data. The second is inference compute: more thinking time per problem. This is revolutionary because it means you can improve AI outputs without retraining the model, simply by spending more compute at inference time. For product developers, test-time compute creates a natural price-quality dial: simple questions get fast, cheap responses, while complex questions trigger extended reasoning at higher cost. For the industry, test-time compute scaling could sustain capability improvements even if training scaling laws begin to plateau.
- In practice
- OpenAI's o1 model demonstrated that spending 10-100x more inference compute on chain-of-thought reasoning dramatically improved performance on math, science, and coding tasks. The model scored in the 89th percentile on the AMC math competition, compared to roughly the 50th percentile for GPT-4. Anthropic's Claude extended thinking mode allocates variable compute based on problem complexity. DeepSeek-R1 brought test-time compute scaling to open-weight models. The economics are notable: a reasoning model might spend $0.50 solving a problem that a standard model attempts for $0.01, but if the reasoning model gets the right answer and the standard model does not, the cost is justified for high-value tasks.
We cover models & architecture every week.
Get the 5 AI stories that matter — free, every Friday.
Related terms
Reasoning model
An AI model specifically designed to perform multi-step reasoning, typically by generating an explicit chain of thought before producing a final answer. Reasoning models trade inference speed and cost for dramatically improved performance on complex problems.
Extended thinking
A model feature where the AI explicitly allocates additional inference compute to reason through complex problems step by step before producing a final answer, with the reasoning process visible to the user or developer.
Chain-of-thought (CoT)
A prompting technique that instructs a model to reason step by step before giving a final answer. CoT dramatically improves accuracy on math, logic, and multi-step problems and is now built into many model architectures.
Inference cost
The expense of running an AI model in production, typically measured per million tokens. Inference costs have dropped 10-100x in the past two years, enabling new business models and use cases.
Know the terms. Know the moves.
Get the 5 AI stories that matter every Friday — free.
Free forever. No spam.