Models & ArchitectureCore

Parameter

Definition
A learnable value inside a neural network that gets adjusted during training. Model size is measured in parameters (e.g., 70B, 405B), which roughly correlates with capability and cost.
Why it matters
Parameter count is the most-cited but most-misunderstood metric in AI. More parameters generally mean more capability, but the relationship is not linear: architecture, training data quality, and training duration all matter as much or more. A well-trained 7B model can outperform a poorly trained 70B model on many tasks. Parameter count does correlate reliably with two things: memory requirements (how much GPU RAM you need) and inference cost (roughly proportional to parameters for dense models). Understanding this helps you make informed model-selection decisions rather than defaulting to 'bigger is better.'
In practice
The parameter count race has produced models from 1B (Phi-3 Mini) to 405B (Llama 3.1) to rumored 1T+ (GPT-4). Microsoft's Phi-3 family proved that smaller parameter counts with better data can match much larger models. The mixture-of-experts architecture complicates parameter counting: Mixtral 8x7B has 47B total parameters but activates only 12B per token. For deployment planning, parameter count determines GPU requirements: a 70B model in FP16 needs approximately 140GB of GPU memory, requiring at least two H100 GPUs. Quantization to 4-bit reduces this to approximately 35GB, fitting on a single GPU.

We cover models & architecture every week.

Get the 5 AI stories that matter — free, every Friday.

Know the terms. Know the moves.

Get the 5 AI stories that matter every Friday — free.

Free forever. No spam.