Token pricing
- Definition
- The cost model used by AI API providers, charging per million input and output tokens. Prices have fallen dramatically, from $60/M tokens (GPT-4, 2023) to under $1/M tokens for many models in 2026.
- Why it matters
- Token pricing is the single most important variable in AI unit economics. If you are building an AI product, your gross margin is determined by the gap between what you charge users and what you pay in token costs. Understanding token pricing trends is essential for financial planning: costs are dropping approximately 10x per year, which means use cases that are uneconomical today may be highly profitable in 12-18 months. Conversely, building a business model that only works at current pricing is risky, because competitors will build the same thing at lower cost when prices drop. The pricing trend also enables new product categories: real-time voice agents, continuous code review, and always-on monitoring become viable as per-token costs approach zero.
- In practice
- GPT-4 launched at $30/M input tokens and $60/M output tokens in March 2023. GPT-4o Mini launched at $0.15/M input and $0.60/M output in July 2024. Claude 3.5 Sonnet prices at $3/$15 per million tokens. DeepSeek offers frontier-class models at $0.14/$0.28 per million tokens. Output tokens are typically 2-4x more expensive than input tokens because each output token requires a full forward pass. The pricing war is intensifying as open-source models (free to self-host) set a floor. Enterprise customers negotiate volume discounts of 20-50%. The trend clearly favors builders: features that cost $100/day in API calls today will cost $10/day within a year.
We cover business & strategy every week.
Get the 5 AI stories that matter — free, every Friday.
Related terms
Token
The basic unit of text that AI models process, roughly equivalent to 3/4 of a word in English. Tokens are how models read, price, and limit input and output, making token efficiency a key cost lever.
Inference cost
The expense of running an AI model in production, typically measured per million tokens. Inference costs have dropped 10-100x in the past two years, enabling new business models and use cases.
Inference economics
The study of costs, pricing models, and margin structures around running AI models in production, encompassing hardware costs, model efficiency, pricing strategies, and the competitive dynamics of the inference market.
API (Application Programming Interface)
The programmatic interface that lets developers send prompts to an AI model and receive responses. Model vendors like OpenAI, Anthropic, and Google monetize primarily through API access, priced per token.
Know the terms. Know the moves.
Get the 5 AI stories that matter every Friday — free.
Free forever. No spam.