Extended thinking
- Definition
- A model feature where the AI explicitly allocates additional inference compute to reason through complex problems step by step before producing a final answer, with the reasoning process visible to the user or developer.
- Why it matters
- Extended thinking represents a paradigm shift: instead of making models bigger, you make them think longer. This is significant because it decouples capability from model size, letting smaller models solve harder problems by spending more time on each one. For product developers, extended thinking opens up use cases, complex analysis, multi-step planning, mathematical proofs, that previously required human experts. The trade-off is cost and latency: extended thinking can use 10-100x more tokens than a standard response. Smart implementations let users or systems decide when to invoke extended thinking, reserving it for genuinely complex queries.
- In practice
- OpenAI's o1 model, launched in September 2024, was the first commercial reasoning model with visible chain-of-thought. Anthropic followed with Claude's extended thinking mode, which shows the model's reasoning process in a dedicated thinking block. DeepSeek-R1 demonstrated that extended thinking could be achieved with open-weight models. In practice, extended thinking models score 20-50% higher on math competitions, coding challenges, and scientific reasoning benchmarks compared to standard models. Enterprise users report that extended thinking is most valuable for complex document analysis, financial modeling, and legal research where accuracy is worth the extra latency and cost.
We cover models & architecture every week.
Get the 5 AI stories that matter — free, every Friday.
Related terms
Chain-of-thought (CoT)
A prompting technique that instructs a model to reason step by step before giving a final answer. CoT dramatically improves accuracy on math, logic, and multi-step problems and is now built into many model architectures.
Reasoning model
An AI model specifically designed to perform multi-step reasoning, typically by generating an explicit chain of thought before producing a final answer. Reasoning models trade inference speed and cost for dramatically improved performance on complex problems.
Test-time compute
The practice of allocating additional compute during inference to improve output quality, rather than relying solely on the capabilities baked in during training. Reasoning models and extended thinking are the primary examples of test-time compute scaling.
Inference cost
The expense of running an AI model in production, typically measured per million tokens. Inference costs have dropped 10-100x in the past two years, enabling new business models and use cases.
Know the terms. Know the moves.
Get the 5 AI stories that matter every Friday — free.
Free forever. No spam.