Parameter
- Definition
- A learnable value inside a neural network that gets adjusted during training. Model size is measured in parameters (e.g., 70B, 405B), which roughly correlates with capability and cost.
- Why it matters
- Parameter count is the most-cited but most-misunderstood metric in AI. More parameters generally mean more capability, but the relationship is not linear: architecture, training data quality, and training duration all matter as much or more. A well-trained 7B model can outperform a poorly trained 70B model on many tasks. Parameter count does correlate reliably with two things: memory requirements (how much GPU RAM you need) and inference cost (roughly proportional to parameters for dense models). Understanding this helps you make informed model-selection decisions rather than defaulting to 'bigger is better.'
- In practice
- The parameter count race has produced models from 1B (Phi-3 Mini) to 405B (Llama 3.1) to rumored 1T+ (GPT-4). Microsoft's Phi-3 family proved that smaller parameter counts with better data can match much larger models. The mixture-of-experts architecture complicates parameter counting: Mixtral 8x7B has 47B total parameters but activates only 12B per token. For deployment planning, parameter count determines GPU requirements: a 70B model in FP16 needs approximately 140GB of GPU memory, requiring at least two H100 GPUs. Quantization to 4-bit reduces this to approximately 35GB, fitting on a single GPU.
We cover models & architecture every week.
Get the 5 AI stories that matter — free, every Friday.
Related terms
Weights
The numerical values inside a neural network that encode everything the model has learned during training. Model weights are the core intellectual property of AI companies and the subject of intense open-vs.-closed debates.
Neural network
A computing architecture inspired by the brain, made of layers of interconnected nodes (neurons) that learn patterns from data. Neural networks are the fundamental building block of all modern AI.
Quantization
Reducing the numerical precision of a model's weights (e.g., from 32-bit to 4-bit) to shrink its memory footprint and speed up inference. Quantization makes it possible to run large models on consumer hardware.
FLOPS (Floating-Point Operations Per Second)
A measure of computational throughput. Training frontier models requires exaFLOPS (10^18) of compute, making FLOPS a proxy for the cost and scale of an AI project.
Know the terms. Know the moves.
Get the 5 AI stories that matter every Friday — free.
Free forever. No spam.