Token
- Definition
- The basic unit of text that AI models process, roughly equivalent to 3/4 of a word in English. Tokens are how models read, price, and limit input and output, making token efficiency a key cost lever.
- Why it matters
- Tokens are the atoms of the AI economy. Every interaction with an AI model is measured, priced, and limited in tokens. Understanding tokenization helps you estimate costs, optimize prompts, and debug unexpected model behavior. A 100-word email is roughly 130 tokens; a 50-page document is roughly 15,000 tokens. Token efficiency, getting more done with fewer tokens, directly impacts your AI costs. This is why concise system prompts, efficient retrieval (fetching only relevant documents), and smart caching matter. For business planning, token consumption patterns determine whether an AI feature's unit economics work at scale.
- In practice
- OpenAI's tiktoken, Anthropic's tokenizer, and Google's SentencePiece split text into tokens differently, meaning the same text can be a different number of tokens depending on the model. GPT-4o uses approximately 25% fewer tokens than GPT-4 for the same text due to an improved tokenizer. Multilingual text typically tokenizes less efficiently than English, meaning the same content costs more. Token pricing ranges from $0.10/M tokens (efficient models) to $60/M tokens (frontier reasoning models). Companies optimize token usage through: prompt compression, caching repeated context, batching similar requests, and choosing the right model size for each task.
We cover models & architecture every week.
Get the 5 AI stories that matter — free, every Friday.
Related terms
Tokenizer
The algorithm that splits text into tokens before a model can process it. Different models use different tokenizers, which affects how efficiently they handle various languages, code, and specialized content.
Token pricing
The cost model used by AI API providers, charging per million input and output tokens. Prices have fallen dramatically, from $60/M tokens (GPT-4, 2023) to under $1/M tokens for many models in 2026.
Context window
The maximum number of tokens a model can process in a single request, including both the prompt and the response. Larger context windows (100K-2M tokens) let models ingest entire codebases or documents at once.
Inference cost
The expense of running an AI model in production, typically measured per million tokens. Inference costs have dropped 10-100x in the past two years, enabling new business models and use cases.
Know the terms. Know the moves.
Get the 5 AI stories that matter every Friday — free.
Free forever. No spam.