Deep learning
- Definition
- A subset of machine learning that uses neural networks with many layers to learn complex patterns from data. Deep learning powers virtually all modern AI breakthroughs, from image recognition to language generation.
- Why it matters
- Deep learning is the technical foundation of the entire AI boom. Understanding it at a conceptual level, if not a mathematical one, is essential for any executive making AI investment decisions. The key insight: deep learning works by stacking layers of simple mathematical operations, and the 'depth' (number of layers) is what allows the network to learn increasingly abstract representations. This is why more parameters generally (but not always) correlate with more capability. The practical implication for decision-makers: deep learning requires massive data and compute, which favors well-resourced organizations, but transfer learning and fine-tuning make it accessible to smaller teams.
- In practice
- Deep learning's commercial breakthrough was AlexNet winning the ImageNet competition in 2012 by a wide margin, launching the current AI era. Since then, deep learning has become the dominant approach in vision (ResNet, CLIP), language (GPT, BERT, Claude), speech (Whisper), and generation (Stable Diffusion, DALL-E). The field has moved from convolutional neural networks (CNNs) for images to transformers for everything. Modern frontier models have hundreds of billions of parameters across hundreds of layers. The compute required has doubled every 6 months, growing far faster than Moore's Law.
We cover models & architecture every week.
Get the 5 AI stories that matter — free, every Friday.
Related terms
Neural network
A computing architecture inspired by the brain, made of layers of interconnected nodes (neurons) that learn patterns from data. Neural networks are the fundamental building block of all modern AI.
Transformer
The neural network architecture behind virtually all modern language and multi-modal models. Introduced in Google's 2017 'Attention Is All You Need' paper, transformers use self-attention to process sequences in parallel.
Pre-training
The initial phase of model training where the network learns general knowledge from a massive dataset. Pre-training is the most expensive phase, often costing tens or hundreds of millions of dollars for frontier models.
Foundation model
A large, general-purpose model pre-trained on broad data that can be adapted to many downstream tasks. GPT-4, Claude, Gemini, and Llama are all foundation models. The term signals massive upfront investment and wide applicability.
Know the terms. Know the moves.
Get the 5 AI stories that matter every Friday — free.
Free forever. No spam.