World model
- Definition
- An internal representation of how the world works that an AI system uses to predict outcomes, plan actions, and reason about physical or causal relationships. World models are considered essential for achieving general intelligence and advanced robotics.
- Why it matters
- The world model debate is at the heart of AI's biggest philosophical and technical question: do LLMs actually understand the world, or are they just pattern-matching on text? Critics (led by Yann LeCun) argue that text-only models cannot build genuine world models and that new architectures are needed. Proponents argue that LLMs have already developed implicit world models from training on descriptions of the world. This matters practically because genuine world models would enable planning, causal reasoning, and physical interaction, capabilities needed for AGI and advanced robotics. For the industry, the world model question determines whether current architectures will plateau or continue improving.
- In practice
- Yann LeCun at Meta has championed the Joint Embedding Predictive Architecture (JEPA) as a path to world models, arguing that transformers trained on text alone cannot develop true understanding. Google DeepMind's Genie 2 generates interactive 3D environments from single images, demonstrating implicit world modeling. Tesla's FSD and Waymo's self-driving systems use learned world models to predict other vehicles' behavior. Sora (OpenAI's video generation model) implicitly models physics by generating realistic video, though it still makes physically impossible errors. The research consensus is emerging that multi-modal training (text + video + interaction) is more likely to produce robust world models than text alone.
We cover models & architecture every week.
Get the 5 AI stories that matter — free, every Friday.
Related terms
AGI (Artificial General Intelligence)
A hypothetical AI system that matches or exceeds human-level reasoning across every cognitive domain. No AGI exists today, but the race to build one is driving hundreds of billions in investment.
Embodied AI
AI systems that interact with the physical world through a robotic body or sensor array, combining perception, planning, and motor control. Embodied AI bridges the gap between digital intelligence and physical action.
Multi-modal
An AI model that can process and generate multiple data types, such as text, images, audio, and video in a single system. Multi-modal models like GPT-4o and Gemini are converging previously separate AI capabilities.
Foundation model
A large, general-purpose model pre-trained on broad data that can be adapted to many downstream tasks. GPT-4, Claude, Gemini, and Llama are all foundation models. The term signals massive upfront investment and wide applicability.
Know the terms. Know the moves.
Get the 5 AI stories that matter every Friday — free.
Free forever. No spam.