GPU (Graphics Processing Unit)
- Definition
- The hardware chip that powers AI training and inference. NVIDIA's H100 and B200 GPUs are the most sought-after compute in the industry, with wait times and pricing driving major strategic decisions.
- Why it matters
- GPUs are the bottleneck of the AI industry. Every model trained, every inference served, and every AI product deployed depends on GPU availability. NVIDIA controls approximately 80-90% of the AI accelerator market, creating a single-vendor dependency that keeps CEOs awake at night. GPU access determines who can train frontier models, how fast inference can scale, and what AI products are economically viable. The GPU shortage of 2023-2024 shaped the entire industry: companies with reserved GPU capacity had a strategic advantage, while those without were locked out of frontier training. Understanding GPU economics is essential for anyone making AI infrastructure decisions.
- In practice
- NVIDIA's H100 GPUs shipped in volume starting in 2023 at approximately $30,000 each, with cloud rental at $2-3/hour. The B200 (Blackwell architecture, 2024-2025) offered roughly 2.5x the performance. The 'GPU rich' companies (Microsoft, Google, Meta, Amazon) secured tens of thousands of GPUs through multi-billion-dollar orders. Startups and smaller companies relied on cloud providers or specialized GPU clouds like CoreWeave, Lambda, and Together AI. AMD's MI300X emerged as a credible alternative, and custom silicon from Google (TPU), Amazon (Trainium), and Microsoft (Maia) is diversifying the market. The GPU bottleneck is gradually easing as supply ramps up.
We cover infrastructure & compute every week.
Get the 5 AI stories that matter — free, every Friday.
Related terms
TPU (Tensor Processing Unit)
Google's custom AI accelerator chip, designed specifically for neural network workloads. TPUs power Google's internal AI training and are available via Google Cloud, competing with NVIDIA's GPU ecosystem.
Training
The process of teaching a neural network by feeding it data and adjusting its parameters to minimize prediction errors. Training frontier models now costs $100M+ and takes months on thousands of GPUs.
Inference
The process of running a trained model to generate predictions or outputs from new inputs. Inference cost per token is the key economic metric for AI deployment and is falling rapidly.
FLOPS (Floating-Point Operations Per Second)
A measure of computational throughput. Training frontier models requires exaFLOPS (10^18) of compute, making FLOPS a proxy for the cost and scale of an AI project.
Know the terms. Know the moves.
Get the 5 AI stories that matter every Friday — free.
Free forever. No spam.