Model WarsApril 16, 2025via OpenAI Blog
Thinking with images
Why it matters
OpenAI's o3 and o4-mini models introduce visual reasoning directly into chain-of-thought processing, expanding multimodal capabilities beyond image recognition to true visual reasoning. This represents a material shift in how vision-language models process and understand images.
Key signals
- OpenAI releases o3 and o4-mini with visual reasoning in chain of thought
- Models can now reason WITH images as part of their reasoning process
- Represents breakthrough in multimodal perception capabilities
- Capability: visual perception through reasoning chains
The hook
OpenAI o3 and o4-mini now reason WITH images, not just about them. Chain-of-thought visual perception is a new capability tier.
OpenAI o3 and o4-mini represent a significant breakthrough in visual perception by reasoning with images in their chain of thought.
Relevance score:78/100