Neural network
- Definition
- A computing architecture inspired by the brain, made of layers of interconnected nodes (neurons) that learn patterns from data. Neural networks are the fundamental building block of all modern AI.
- Why it matters
- Neural networks are to AI what transistors are to computing: the fundamental building block everything else is built on. Every LLM, every image generator, every speech recognizer is a neural network. Understanding the basic concept, layers of mathematical operations that transform inputs into outputs through learned parameters, gives you the mental model to understand why AI works the way it does. It explains why more data and more compute generally lead to better performance (more examples to learn from, more parameters to encode patterns), why AI can be biased (it learns from its training data), and why AI outputs are probabilistic (the network outputs probabilities, not certainties).
- In practice
- The modern AI era began with AlexNet (2012), an 8-layer neural network that won the ImageNet competition. Today's frontier models have hundreds of layers and hundreds of billions of parameters. The dominant neural network architecture is the transformer, but convolutional networks (for vision), recurrent networks (for sequences), and graph networks (for relational data) remain important for specialized tasks. Neural network training requires GPUs because the matrix multiplication operations that neural networks are made of map perfectly to GPU architectures. PyTorch (from Meta) and JAX (from Google) are the dominant frameworks for building and training neural networks.
We cover models & architecture every week.
Get the 5 AI stories that matter — free, every Friday.
Related terms
Deep learning
A subset of machine learning that uses neural networks with many layers to learn complex patterns from data. Deep learning powers virtually all modern AI breakthroughs, from image recognition to language generation.
Transformer
The neural network architecture behind virtually all modern language and multi-modal models. Introduced in Google's 2017 'Attention Is All You Need' paper, transformers use self-attention to process sequences in parallel.
Parameter
A learnable value inside a neural network that gets adjusted during training. Model size is measured in parameters (e.g., 70B, 405B), which roughly correlates with capability and cost.
Weights
The numerical values inside a neural network that encode everything the model has learned during training. Model weights are the core intellectual property of AI companies and the subject of intense open-vs.-closed debates.
Know the terms. Know the moves.
Get the 5 AI stories that matter every Friday — free.
Free forever. No spam.