TPU (Tensor Processing Unit)
- Definition
- Google's custom AI accelerator chip, designed specifically for neural network workloads. TPUs power Google's internal AI training and are available via Google Cloud, competing with NVIDIA's GPU ecosystem.
- Why it matters
- TPUs represent the most mature alternative to NVIDIA's GPU monopoly. Google has been developing TPUs since 2016, and they power all of Google's AI products: Search, Gemini, YouTube recommendations, and Google Cloud AI services. For the industry, TPUs demonstrate that custom silicon can match or exceed general-purpose GPUs for AI workloads. This has inspired Amazon (Trainium/Inferentia), Microsoft (Maia), and numerous startups to develop custom AI chips. For enterprise buyers, TPU availability on Google Cloud provides a competitive alternative to NVIDIA-based offerings, potentially reducing costs and vendor lock-in. The custom silicon trend is gradually loosening NVIDIA's grip on the AI compute market.
- In practice
- Google's TPU v5p, announced in late 2023, delivers 459 TFLOPS per chip and scales to pods of 8,960 chips for massive training runs. Gemini was trained on TPU v5p pods. TPUs are available on Google Cloud at competitive rates compared to NVIDIA GPUs. The TPU software ecosystem (JAX, XLA) has matured significantly but still trails NVIDIA's CUDA ecosystem in breadth. Companies like Anthropic have used TPUs for training Claude models. For inference, TPUs compete well on price-performance for large batch workloads. The strategic significance: Google's TPU investment ensures it cannot be held hostage by NVIDIA supply constraints, a competitive advantage during GPU shortages.
We cover infrastructure & compute every week.
Get the 5 AI stories that matter — free, every Friday.
Related terms
GPU (Graphics Processing Unit)
The hardware chip that powers AI training and inference. NVIDIA's H100 and B200 GPUs are the most sought-after compute in the industry, with wait times and pricing driving major strategic decisions.
Training
The process of teaching a neural network by feeding it data and adjusting its parameters to minimize prediction errors. Training frontier models now costs $100M+ and takes months on thousands of GPUs.
Inference
The process of running a trained model to generate predictions or outputs from new inputs. Inference cost per token is the key economic metric for AI deployment and is falling rapidly.
Hyperscaler
A cloud computing provider operating at massive scale, primarily Microsoft Azure, Amazon AWS, and Google Cloud. Hyperscalers provide the GPU infrastructure, managed AI services, and global data center networks that power most AI deployments.
Know the terms. Know the moves.
Get the 5 AI stories that matter every Friday — free.
Free forever. No spam.