LLM (Large Language Model)
- Definition
- A neural network trained on massive text corpora to predict and generate language. LLMs like GPT-4, Claude, and Gemini are the foundation of the current AI wave, powering chatbots, coding tools, and enterprise automation.
- Why it matters
- LLMs are the most commercially significant AI technology ever developed. They have created a $100B+ market in under three years, disrupted software development, transformed customer service, and triggered the largest infrastructure buildout since the internet. Understanding LLMs, their capabilities, limitations, and economics, is now a core competency for every business leader. LLMs are not magic: they predict the most likely next token based on patterns in training data. This means they are brilliant at tasks that match their training distribution and unreliable at tasks that require genuine novelty or factual precision. Knowing when to use LLMs and when not to is the skill that separates successful AI deployments from expensive failures.
- In practice
- The LLM market is dominated by a handful of players: OpenAI (GPT-4, o1), Anthropic (Claude), Google (Gemini), Meta (Llama), and Mistral. Enterprise adoption has moved from experimentation to production: by 2025, an estimated 65% of Fortune 500 companies had at least one LLM-powered feature in production. Use cases range from customer support automation (reducing ticket volumes by 40-60%) to code generation (GitHub Copilot contributing 40%+ of code at some companies) to document analysis (legal and financial firms processing thousands of pages per day). The market is rapidly segmenting by size, cost, and capability, with different models optimized for different use cases.
We cover models & architecture every week.
Get the 5 AI stories that matter — free, every Friday.
Related terms
Transformer
The neural network architecture behind virtually all modern language and multi-modal models. Introduced in Google's 2017 'Attention Is All You Need' paper, transformers use self-attention to process sequences in parallel.
Foundation model
A large, general-purpose model pre-trained on broad data that can be adapted to many downstream tasks. GPT-4, Claude, Gemini, and Llama are all foundation models. The term signals massive upfront investment and wide applicability.
Token
The basic unit of text that AI models process, roughly equivalent to 3/4 of a word in English. Tokens are how models read, price, and limit input and output, making token efficiency a key cost lever.
Pre-training
The initial phase of model training where the network learns general knowledge from a massive dataset. Pre-training is the most expensive phase, often costing tens or hundreds of millions of dollars for frontier models.
Know the terms. Know the moves.
Get the 5 AI stories that matter every Friday — free.
Free forever. No spam.