Reasoning model
- Definition
- An AI model specifically designed to perform multi-step reasoning, typically by generating an explicit chain of thought before producing a final answer. Reasoning models trade inference speed and cost for dramatically improved performance on complex problems.
- Why it matters
- Reasoning models represent the most significant paradigm shift since the original GPT. Instead of optimizing only the pre-training phase, reasoning models also optimize what happens during inference, spending more compute thinking through problems. This opens up a new scaling dimension: even if pre-training scaling laws plateau, inference-time compute scaling can continue to improve performance. For business applications, reasoning models unlock use cases that standard LLMs cannot handle: complex multi-step analysis, mathematical problem solving, advanced coding tasks, and strategic planning. The trade-off is cost and latency: a reasoning model might take 30 seconds and $0.50 to solve a problem that a standard model attempts in 2 seconds for $0.01, but the reasoning model gets the right answer.
- In practice
- OpenAI's o1 (September 2024) was the first commercial reasoning model, scoring in the 89th percentile on competition math. DeepSeek-R1, released in January 2025, demonstrated that open-weight reasoning models could match o1 performance and sparked a wave of open reasoning model development. Anthropic's Claude added extended thinking for complex reasoning tasks. Google's Gemini 2.0 Flash Thinking followed. In enterprise deployments, reasoning models are used selectively for high-value tasks: financial analysis, legal document review, complex debugging, and scientific research, while standard models handle simpler interactions.
We cover models & architecture every week.
Get the 5 AI stories that matter — free, every Friday.
Related terms
Chain-of-thought (CoT)
A prompting technique that instructs a model to reason step by step before giving a final answer. CoT dramatically improves accuracy on math, logic, and multi-step problems and is now built into many model architectures.
Extended thinking
A model feature where the AI explicitly allocates additional inference compute to reason through complex problems step by step before producing a final answer, with the reasoning process visible to the user or developer.
Test-time compute
The practice of allocating additional compute during inference to improve output quality, rather than relying solely on the capabilities baked in during training. Reasoning models and extended thinking are the primary examples of test-time compute scaling.
Frontier model
The most capable AI model available at any given time, representing the current state of the art. Frontier models push the boundaries of what AI can do and are typically the most expensive to train and run.
Know the terms. Know the moves.
Get the 5 AI stories that matter every Friday — free.
Free forever. No spam.