Question 1

What is Throughput?

Accepted Answer

The number of tokens or requests an AI system can process per second. High throughput is essential for batch processing, high-traffic applications, and cost-efficient inference at scale.

Question 2

Why does Throughput matter for business?

Accepted Answer

Throughput determines how much work your AI infrastructure can handle and, by extension, how many users you can serve cost-effectively. Latency tells you how fast a single request completes; throughput tells you how many requests you can handle simultaneously. For applications serving millions of users, throughput is often the binding constraint: you might have acceptable per-request latency but insufficient throughput to handle peak traffic. Optimizing throughput involves batching requests, using continuous batching, maximizing GPU utilization, and choosing the right hardware. For self-hosted deployments, throughput directly determines your cost per request and therefore your gross margin.

Throughput

Related terms

Know the terms. Know the moves.