What is A2A (Agent-to-Agent)?

A protocol that enables AI agents built by different vendors to discover, authenticate, and collaborate with each other. A2A standardizes how agents delegate sub-tasks, share context, and return results across organizational boundaries.

An AI system that can autonomously plan, use tools, and execute multi-step tasks on behalf of a user. Agents are the next major product paradigm after chatbots, with every major lab shipping agent frameworks.

What is Agentic orchestration?

The architecture pattern of coordinating multiple AI agents to accomplish complex tasks, with a supervisor agent routing work, managing state, and combining results from specialized sub-agents.

A retrieval-augmented generation system where an AI agent autonomously decides when, what, and how to retrieve information — dynamically choosing between multiple knowledge sources, reformulating queries, and iterating on results rather than making a single fixed retrieval call.

What is Agentic workflow?

A multi-step process where an AI agent plans, executes, evaluates, and iterates on tasks with minimal human intervention. Unlike single-turn prompts, agentic workflows involve loops, branching logic, and tool calls that unfold over minutes or hours.

The operational discipline of deploying, monitoring, debugging, and managing AI agents in production. AgentOps encompasses observability, cost tracking, failure recovery, human-in-the-loop escalation, and compliance auditing for autonomous AI systems.

What is AGI (Artificial General Intelligence)?

A hypothetical AI system that matches or exceeds human-level reasoning across every cognitive domain. No AGI exists today, but the race to build one is driving hundreds of billions in investment.

What is AI Bill of Materials (AIBOM)?

A comprehensive inventory documenting every component of an AI system — training data sources, model architecture, fine-tuning datasets, third-party APIs, infrastructure dependencies, and known limitations. An AIBOM is the AI equivalent of a software bill of materials (SBOM), designed for auditability and regulatory compliance.

What is AI Co-Scientist?

An AI system designed to collaborate with human researchers on scientific discovery — generating hypotheses, designing experiments, analyzing results, and iterating on findings autonomously. Unlike general AI assistants, co-scientists are domain-specialized and can operate within the scientific method.

What is AI Energy Consumption?

The total electrical power consumed by AI systems across training, inference, and supporting infrastructure — including data centers, cooling, and networking. AI energy consumption is growing exponentially, with the International Energy Agency projecting AI could consume 3-4% of global electricity by 2030.

What is AI governance?

The organizational frameworks, policies, and processes that govern how AI systems are developed, deployed, monitored, and retired within an enterprise. AI governance covers model risk management, bias auditing, access controls, and regulatory compliance.

A personal computer equipped with a dedicated Neural Processing Unit (NPU) designed to run AI inference workloads locally, without relying on cloud APIs. AI PCs enable on-device features like real-time translation, image generation, and local copilots.

What is AI Red Lines?

Absolute boundaries on AI capabilities or deployments that should never be crossed regardless of economic incentive — such as autonomous weapons systems, mass surveillance without consent, or AI systems that manipulate democratic processes. AI red lines represent the governance community's attempt to establish non-negotiable limits before capabilities outpace policy.

The interdisciplinary field focused on ensuring AI systems behave as intended and do not cause unintended harm. Encompasses alignment research, red teaming, content filtering, and policy advocacy.

Low-quality, mass-produced AI-generated content flooding the internet, including formulaic blog posts, recycled social media content, and SEO spam. The term emerged as a cultural backlash against undifferentiated AI output.

What is AI Supply Chain?

The end-to-end chain of dependencies that an AI system relies on — from training data providers and model vendors to inference infrastructure, third-party plugins, and monitoring tools. AI supply chain risk is the emerging security discipline focused on vulnerabilities at every link in this chain.

A period of reduced funding, hype, and progress in AI research. The field experienced major winters in the 1970s and late 1980s. Investors watch for signs of a new winter whenever growth narratives stall.

A product that provides a user interface or workflow layer on top of a foundation model API, adding relatively little proprietary technology. 'Wrapper' is often used pejoratively to imply thin differentiation and vulnerability to platform risk.

What is AI-first GTM?

A go-to-market strategy where the product's core value proposition, distribution channels, and growth loops are built around AI capabilities from day one, rather than adding AI features to an existing product.

A company or product built from the ground up around AI capabilities, rather than bolting AI onto legacy software. AI-native startups often have fundamentally different cost structures and GTM motions.

AI Glossary

The AI dictionary for CTOs.founders.investors.product leaders.board members.enthusiasts.decision-makers.professionals.CTOs.

30 AI models compared. 145 terms explained. Built for CTOs, founders, and investors who need to speak the language without reading the papers.

A2A (Agent-to-Agent): A protocol that enables AI agents built by different vendors to discover, authenticate, and collaborate with each other. A2A standardizes how agents delegate sub-tasks, share context, and return results across organizational boundaries.
Agent: An AI system that can autonomously plan, use tools, and execute multi-step tasks on behalf of a user. Agents are the next major product paradigm after chatbots, with every major lab shipping agent frameworks.
Agentic orchestration: The architecture pattern of coordinating multiple AI agents to accomplish complex tasks, with a supervisor agent routing work, managing state, and combining results from specialized sub-agents.
Agentic RAG: A retrieval-augmented generation system where an AI agent autonomously decides when, what, and how to retrieve information — dynamically choosing between multiple knowledge sources, reformulating queries, and iterating on results rather than making a single fixed retrieval call.
Agentic workflow: A multi-step process where an AI agent plans, executes, evaluates, and iterates on tasks with minimal human intervention. Unlike single-turn prompts, agentic workflows involve loops, branching logic, and tool calls that unfold over minutes or hours.
AgentOps: The operational discipline of deploying, monitoring, debugging, and managing AI agents in production. AgentOps encompasses observability, cost tracking, failure recovery, human-in-the-loop escalation, and compliance auditing for autonomous AI systems.
AGI (Artificial General Intelligence): A hypothetical AI system that matches or exceeds human-level reasoning across every cognitive domain. No AGI exists today, but the race to build one is driving hundreds of billions in investment.
AI Bill of Materials (AIBOM): A comprehensive inventory documenting every component of an AI system — training data sources, model architecture, fine-tuning datasets, third-party APIs, infrastructure dependencies, and known limitations. An AIBOM is the AI equivalent of a software bill of materials (SBOM), designed for auditability and regulatory compliance.
AI Co-Scientist: An AI system designed to collaborate with human researchers on scientific discovery — generating hypotheses, designing experiments, analyzing results, and iterating on findings autonomously. Unlike general AI assistants, co-scientists are domain-specialized and can operate within the scientific method.
AI Energy Consumption: The total electrical power consumed by AI systems across training, inference, and supporting infrastructure — including data centers, cooling, and networking. AI energy consumption is growing exponentially, with the International Energy Agency projecting AI could consume 3-4% of global electricity by 2030.
AI governance: The organizational frameworks, policies, and processes that govern how AI systems are developed, deployed, monitored, and retired within an enterprise. AI governance covers model risk management, bias auditing, access controls, and regulatory compliance.
AI PC: A personal computer equipped with a dedicated Neural Processing Unit (NPU) designed to run AI inference workloads locally, without relying on cloud APIs. AI PCs enable on-device features like real-time translation, image generation, and local copilots.
AI Red Lines: Absolute boundaries on AI capabilities or deployments that should never be crossed regardless of economic incentive — such as autonomous weapons systems, mass surveillance without consent, or AI systems that manipulate democratic processes. AI red lines represent the governance community's attempt to establish non-negotiable limits before capabilities outpace policy.
AI safety: The interdisciplinary field focused on ensuring AI systems behave as intended and do not cause unintended harm. Encompasses alignment research, red teaming, content filtering, and policy advocacy.
AI slop: Low-quality, mass-produced AI-generated content flooding the internet, including formulaic blog posts, recycled social media content, and SEO spam. The term emerged as a cultural backlash against undifferentiated AI output.
AI Supply Chain: The end-to-end chain of dependencies that an AI system relies on — from training data providers and model vendors to inference infrastructure, third-party plugins, and monitoring tools. AI supply chain risk is the emerging security discipline focused on vulnerabilities at every link in this chain.
AI winter: A period of reduced funding, hype, and progress in AI research. The field experienced major winters in the 1970s and late 1980s. Investors watch for signs of a new winter whenever growth narratives stall.
AI wrapper: A product that provides a user interface or workflow layer on top of a foundation model API, adding relatively little proprietary technology. 'Wrapper' is often used pejoratively to imply thin differentiation and vulnerability to platform risk.
AI-first GTM: A go-to-market strategy where the product's core value proposition, distribution channels, and growth loops are built around AI capabilities from day one, rather than adding AI features to an existing product.
AI-native: A company or product built from the ground up around AI capabilities, rather than bolting AI onto legacy software. AI-native startups often have fundamentally different cost structures and GTM motions.
Alignment: The challenge of making an AI system's goals and behaviors match human intentions and values. Misalignment risk grows as models become more capable, making this a top priority for safety teams.
API (Application Programming Interface): The programmatic interface that lets developers send prompts to an AI model and receive responses. Model vendors like OpenAI, Anthropic, and Google monetize primarily through API access, priced per token.
ASI (Artificial Superintelligence): A theoretical AI that dramatically surpasses the best human minds in every field. ASI remains speculative, but its possibility shapes long-term safety research and existential-risk debates.
Attention mechanism: The core innovation inside transformers that lets a model weigh the relevance of every token against every other token in a sequence. Attention is what makes modern LLMs understand context and long-range dependencies.
Autoregressive model: A model that generates output one token at a time, with each new token conditioned on all previous tokens. GPT, Claude, and Gemini are all autoregressive, which is why they stream responses word by word.
Batch processing: Running multiple AI inference requests together to maximize throughput and reduce per-request cost. Batch processing is how companies handle large-scale data labeling, content generation, and analytics workloads efficiently.
Benchmark: A standardized test used to compare AI model performance. Common benchmarks include MMLU, HumanEval, and GSM8K. While useful for ranking, benchmarks can be gamed and may not reflect real-world value.
Benchmark gaming: The practice of optimizing a model's performance on specific benchmarks without corresponding improvements in general capability, either through targeted training data, prompt engineering, or architectural shortcuts.
Bias (in AI): Systematic errors in model outputs that reflect skewed training data or flawed design choices. Bias can lead to unfair outcomes in hiring, lending, and content moderation, creating legal and reputational risk.
Capability elicitation: Techniques for discovering the full extent of what an AI model can do, including hidden or emergent capabilities that were not explicitly trained for. Elicitation probes whether a model has dangerous capabilities that standard benchmarks might miss.
Catastrophic forgetting: When a model loses previously learned knowledge after being trained on new data. Catastrophic forgetting is a key challenge in continual learning and one reason fine-tuning must be done carefully.
Chain-of-thought (CoT): A prompting technique that instructs a model to reason step by step before giving a final answer. CoT dramatically improves accuracy on math, logic, and multi-step problems and is now built into many model architectures.
Chief AI Officer (CAIO): A C-suite executive responsible for an organization's AI strategy, governance, model operations, and cross-functional AI adoption. The CAIO role emerged as AI became too strategic and complex for existing CTO or CDO functions to absorb.
Closed-source AI: AI models whose weights, training data, and architecture are proprietary and accessible only through APIs. OpenAI, Anthropic, and Google run closed-source models, monetizing via usage-based pricing.
Co-pilot: An AI assistant that works alongside a human user within an existing workflow, providing suggestions, automating sub-tasks, and augmenting productivity while keeping the human in control of final decisions.
Compute overhang: A situation where available compute capacity grows faster than the algorithmic improvements needed to use it, creating a stockpile of unused potential. A sudden algorithmic breakthrough can then unlock rapid capability jumps.
Computer use: An AI capability where a model can directly interact with a computer's graphical interface, clicking buttons, typing text, navigating menus, and reading screen content just like a human user would.
Constitutional AI: A training methodology developed by Anthropic where an AI model evaluates its own outputs against a written set of principles (a 'constitution') and self-corrects, reducing reliance on human feedback for safety alignment.
Content filtering: Automated systems that screen AI inputs and outputs for harmful, illegal, or off-brand material. Filters are essential for production deployment but can also over-block legitimate use cases.
Context engineering: The practice of strategically designing and managing the full context that is fed to an AI model, including system prompts, retrieved documents, conversation history, tool outputs, and structured metadata, to maximize response quality.
Context window: The maximum number of tokens a model can process in a single request, including both the prompt and the response. Larger context windows (100K-2M tokens) let models ingest entire codebases or documents at once.
Data curation: The process of selecting, cleaning, filtering, deduplicating, and organizing training data to maximize model quality. Data curation is increasingly recognized as more important than dataset size for model performance.
Data flywheel: A self-reinforcing loop where user interactions generate data that improves the model, which attracts more users, generating more data. Data flywheels are among the strongest moats in AI.
Data moat: A competitive advantage derived from proprietary datasets that competitors cannot easily obtain or replicate. Data moats can come from user-generated content, domain-specific corpora, real-world telemetry, or exclusive licensing agreements.
Data poisoning: An attack that corrupts a model's training data to introduce backdoors, biases, or degraded performance. Data poisoning can be targeted (affecting specific outputs) or untargeted (generally degrading model quality).
Deep learning: A subset of machine learning that uses neural networks with many layers to learn complex patterns from data. Deep learning powers virtually all modern AI breakthroughs, from image recognition to language generation.
Diffusion model: A generative model that creates images (or other data) by starting with random noise and iteratively refining it. Stable Diffusion, DALL-E 3, and Midjourney all use diffusion-based architectures.
Distillation: The process of training a smaller, cheaper model to mimic the behavior of a larger, more capable one. Distillation is how companies ship AI to edge devices and reduce inference costs without sacrificing too much quality.
DPO (Direct Preference Optimization): A training technique that aligns language models with human preferences by directly optimizing on preference data, without needing a separate reward model. DPO simplifies the RLHF pipeline while achieving comparable alignment quality.
Efficient model: A model designed to deliver strong performance at a fraction of the compute cost of frontier models, through architectural innovations, aggressive distillation, or better training data curation. Efficient models prioritize the performance-per-dollar ratio.
Embedding: A numerical vector representation of text, images, or other data that captures semantic meaning. Embeddings power search, recommendations, and RAG systems by letting you find conceptually similar content.
Embodied AI: AI systems that interact with the physical world through a robotic body or sensor array, combining perception, planning, and motor control. Embodied AI bridges the gap between digital intelligence and physical action.
Encoder-decoder: A neural network architecture where an encoder compresses input into a representation and a decoder generates output from it. T5 and BART use this pattern, contrasting with decoder-only models like GPT.
Endpoint: A specific URL where an AI model is hosted and accepts API requests. Managing endpoints involves load balancing, rate limiting, and scaling to handle production traffic.
Evals: Systematic evaluation frameworks that measure AI model performance on specific tasks relevant to your use case, going beyond generic benchmarks to test the behaviors that actually matter for your application.
Explainability: The ability to understand and articulate why an AI model produced a specific output. Regulators increasingly demand explainability in high-stakes domains like healthcare, finance, and criminal justice.
Extended thinking: A model feature where the AI explicitly allocates additional inference compute to reason through complex problems step by step before producing a final answer, with the reasoning process visible to the user or developer.
Fairness: The principle that AI systems should produce equitable outcomes across demographic groups. Achieving fairness requires careful dataset curation, evaluation metrics, and ongoing auditing.
Few-shot prompting: Providing a model with a small number of examples in the prompt to guide its behavior, without any fine-tuning. Few-shot is a fast, low-cost way to adapt a general model to a specific task.
Fine-tuning: The process of continuing to train a pre-trained model on a smaller, task-specific dataset. Fine-tuning customizes model behavior for specific domains or formats and is a key part of most enterprise AI deployments.
Flash attention: An optimized implementation of the attention mechanism that reduces memory usage and increases speed by restructuring how attention computations access GPU memory, avoiding the need to materialize the full attention matrix.
FLOPS (Floating-Point Operations Per Second): A measure of computational throughput. Training frontier models requires exaFLOPS (10^18) of compute, making FLOPS a proxy for the cost and scale of an AI project.
Foundation model: A large, general-purpose model pre-trained on broad data that can be adapted to many downstream tasks. GPT-4, Claude, Gemini, and Llama are all foundation models. The term signals massive upfront investment and wide applicability.
Frontier model: The most capable AI model available at any given time, representing the current state of the art. Frontier models push the boundaries of what AI can do and are typically the most expensive to train and run.
Function calling: A model capability that lets the AI output structured tool invocations (API calls, database queries, etc.) rather than plain text. Function calling is what turns a chatbot into an agent that can take real-world actions.
GAN (Generative Adversarial Network): A model architecture where two neural networks (a generator and a discriminator) compete to produce increasingly realistic synthetic data. GANs dominated image generation before diffusion models took over.
GPU (Graphics Processing Unit): The hardware chip that powers AI training and inference. NVIDIA's H100 and B200 GPUs are the most sought-after compute in the industry, with wait times and pricing driving major strategic decisions.
Grounding: Techniques that anchor AI outputs in verifiable facts by connecting models to external knowledge sources. Grounding reduces hallucination and is essential for enterprise use cases where accuracy is non-negotiable.
Guardrails: Programmatic rules and safety layers that constrain AI model behavior in production. Guardrails can block prompt injection, enforce output formats, prevent policy violations, and ensure brand-safe responses.
Hallucination: When an AI model generates confident-sounding but factually incorrect or fabricated information. Hallucination is the number-one barrier to enterprise AI adoption and a major focus of current research.
Horizontal AI: An AI product or platform designed to serve many industries and use cases (e.g., ChatGPT, Copilot). Horizontal plays compete on breadth and distribution, but face commoditization pressure.
Human-in-the-loop (HITL): A design pattern where a human reviews, approves, or corrects AI outputs before they take effect in the real world. HITL balances AI automation benefits with human judgment for high-stakes decisions.
Hyperscaler: A cloud computing provider operating at massive scale, primarily Microsoft Azure, Amazon AWS, and Google Cloud. Hyperscalers provide the GPU infrastructure, managed AI services, and global data center networks that power most AI deployments.
In-context learning: A model's ability to learn new tasks from examples provided in the prompt, without any weight updates. In-context learning is what makes few-shot and zero-shot prompting work and is a defining feature of large language models.
Inference: The process of running a trained model to generate predictions or outputs from new inputs. Inference cost per token is the key economic metric for AI deployment and is falling rapidly.
Inference cost: The expense of running an AI model in production, typically measured per million tokens. Inference costs have dropped 10-100x in the past two years, enabling new business models and use cases.
Inference economics: The study of costs, pricing models, and margin structures around running AI models in production, encompassing hardware costs, model efficiency, pricing strategies, and the competitive dynamics of the inference market.
Inference-Time Scaling: The strategy of allocating additional compute at inference time — rather than during training — to improve model performance on complex queries. Instead of making a bigger model, inference-time scaling makes the existing model think harder on problems that warrant it.
Jailbreak: A technique for bypassing an AI model's safety guardrails to elicit outputs the model was trained to refuse, such as harmful instructions, restricted content, or system prompt leaks.
KV cache: A memory structure that stores the key and value matrices from previous attention computations during autoregressive generation, avoiding redundant recalculation as each new token is produced. KV caching is essential for efficient inference.
Latency: The time between sending a request to an AI model and receiving the first token of the response. Low latency is critical for real-time applications like coding assistants, voice agents, and live customer support.
LLM (Large Language Model): A neural network trained on massive text corpora to predict and generate language. LLMs like GPT-4, Claude, and Gemini are the foundation of the current AI wave, powering chatbots, coding tools, and enterprise automation.
Long-context model: An AI model capable of processing extremely long inputs, typically 100K to 2M+ tokens in a single request. Long-context models can ingest entire books, codebases, or document collections without chunking.
LoRA (Low-Rank Adaptation): A parameter-efficient fine-tuning technique that injects small trainable matrices into a frozen model. LoRA lets companies customize large models at a fraction of the cost of full fine-tuning.
MCP (Model Context Protocol): An open standard (created by Anthropic) that lets AI models connect to external tools, data sources, and services through a unified interface. MCP is becoming the USB-C of AI integrations.
Mixture of Agents (MoA): An architecture where multiple AI models collaborate on the same task in layers — each model refines or critiques the previous models' outputs before producing a final response. Unlike Mixture of Experts (which routes to sub-networks within one model), Mixture of Agents uses entirely separate models working together as an ensemble.
Mixture of Experts (MoE): A model architecture that routes each input to a subset of specialized sub-networks (experts) rather than using the full model. MoE dramatically reduces inference cost while maintaining quality, and is used in GPT-4 and Mixtral.
Moat: A sustainable competitive advantage that prevents rivals from replicating your position. In AI, moats can come from proprietary data, distribution, fine-tuned models, vertical expertise, or switching costs, but raw model capability is rarely a moat.
Model card: A standardized documentation format that describes a model's intended use, training data, performance metrics, limitations, and ethical considerations. Model cards promote transparency and help users make informed decisions about model selection.
Model collapse: A degradation phenomenon where AI models trained on data generated by other AI models progressively lose quality, diversity, and capability. Model collapse occurs when synthetic data replaces human-generated data in training pipelines.
Model Routing: A system that dynamically selects which AI model should handle each request based on the query's complexity, cost constraints, latency requirements, or domain. Model routing lets applications use expensive frontier models only when needed and cheap efficient models for everything else.
Multi-modal: An AI model that can process and generate multiple data types, such as text, images, audio, and video in a single system. Multi-modal models like GPT-4o and Gemini are converging previously separate AI capabilities.
Narrow AI: AI systems designed for a specific task or domain, such as image classification or fraud detection. All commercially deployed AI today is narrow, despite the generality of modern LLMs.
Natural language processing (NLP): The branch of AI focused on enabling machines to understand, interpret, and generate human language. NLP underpins chatbots, translation, search, and document analysis across every industry.
Neural network: A computing architecture inspired by the brain, made of layers of interconnected nodes (neurons) that learn patterns from data. Neural networks are the fundamental building block of all modern AI.
One-person unicorn: A concept where a solo founder or tiny team can build a billion-dollar company by leveraging AI to automate functions that previously required large teams, including engineering, design, marketing, customer support, and operations.
Open weight: A model whose trained parameters are publicly downloadable but whose training data and code may not be shared. Most 'open-source' models are technically open-weight, an important legal and strategic distinction.
Open-source AI: AI models released with open weights and (sometimes) training data, allowing anyone to use, modify, and deploy them. Meta's Llama and Mistral's models lead the open-source wave, competing with closed models from OpenAI and Anthropic.
Orchestration: The coordination layer that manages the flow of data, context, and control between multiple AI models, tools, and data sources within a complex application. Orchestration frameworks handle routing, error recovery, state management, and multi-step workflows.
Overfitting: When a model performs well on training data but poorly on new, unseen data because it memorized patterns rather than learning generalizable features. Overfitting is a constant risk in fine-tuning and custom model development.
Parameter: A learnable value inside a neural network that gets adjusted during training. Model size is measured in parameters (e.g., 70B, 405B), which roughly correlates with capability and cost.
Post-Training: The phase of model development that happens after initial pre-training — including supervised fine-tuning (SFT), reinforcement learning from human feedback (RLHF), direct preference optimization (DPO), and safety tuning. Post-training is what transforms a raw language model into a useful, aligned assistant.
Pre-training: The initial phase of model training where the network learns general knowledge from a massive dataset. Pre-training is the most expensive phase, often costing tens or hundreds of millions of dollars for frontier models.
Pre-training data: The massive datasets used to train foundation models during the pre-training phase, typically composed of web crawls, books, academic papers, code repositories, and other text sources. Pre-training data quality and composition directly determine model capabilities.
Prompt Caching: A technique that stores and reuses the processed representation of frequently repeated prompt prefixes — system prompts, few-shot examples, document context — so the model does not recompute them on every request. Prompt caching can reduce latency by up to 85% and cost by up to 90% for repetitive workloads.
Prompt engineering: The practice of crafting inputs to AI models to elicit desired outputs. Prompt engineering has become a critical skill and even a job title, though its importance may decrease as models improve at understanding intent.
Prompt injection: An attack where malicious text in a prompt tricks an AI model into ignoring its instructions or leaking sensitive data. Prompt injection is the top security concern for production AI applications.
Quantization: Reducing the numerical precision of a model's weights (e.g., from 32-bit to 4-bit) to shrink its memory footprint and speed up inference. Quantization makes it possible to run large models on consumer hardware.
RAFT (Retrieval-Augmented Fine-Tuning): A training technique that combines retrieval-augmented generation with supervised fine-tuning by teaching a model to answer questions given both relevant and irrelevant retrieved documents. RAFT trains the model to cite the right sources and ignore distractors, producing more accurate and grounded responses than either RAG or fine-tuning alone.
RAG (Retrieval-Augmented Generation): A technique that retrieves relevant documents from an external knowledge base and feeds them to a model alongside the user's query. RAG reduces hallucination and keeps responses grounded in current, factual data.
Rate limiting: Controls that cap the number of API requests a user or application can make in a given time period. Rate limits are how AI providers manage capacity, prevent abuse, and enforce pricing tiers.
Reasoning model: An AI model specifically designed to perform multi-step reasoning, typically by generating an explicit chain of thought before producing a final answer. Reasoning models trade inference speed and cost for dramatically improved performance on complex problems.
Red teaming: The practice of systematically probing an AI system to find vulnerabilities, biases, and failure modes before deployment. Red teaming is now standard practice at major AI labs and increasingly required by regulation.
Reinforcement Learning from Human Feedback (RLHF): A training technique where human raters rank model outputs, and the model learns to prefer higher-ranked responses. RLHF is what makes AI assistants helpful, harmless, and conversational rather than just autocomplete.
Responsible AI: A framework for developing and deploying AI systems that are ethical, transparent, and accountable. Responsible AI practices are becoming table stakes for enterprise procurement and regulatory compliance.
Responsible scaling policy: A governance framework that ties the deployment of increasingly capable AI models to demonstrated safety evaluations, creating commitments about what safety conditions must be met before a model can be released or scaled.
Retrieval: The process of finding and fetching relevant information from a knowledge base, database, or document store to provide context for an AI model. Retrieval quality is the single biggest determinant of RAG system performance.
Scaling laws: Empirical relationships showing that model performance improves predictably as you increase data, compute, and parameters. Scaling laws are why labs are pouring billions into ever-larger training runs.
SFT (Supervised Fine-Tuning): The process of training a pre-trained model on a curated dataset of input-output examples that demonstrate the desired behavior. SFT is typically the first alignment step after pre-training, teaching the model to follow instructions and produce useful responses.
Shadow AI: The use of AI tools by employees without IT or management approval, bypassing corporate security policies and data governance controls. Shadow AI parallels shadow IT but with higher risk due to the data-hungry nature of AI tools.
SLA (Service Level Agreement): A contract defining uptime, latency, and throughput guarantees for an AI service. Enterprise buyers evaluate AI vendors heavily on SLAs, especially for mission-critical applications.
Sovereign AI: A nation's capacity to develop and control its own AI capabilities — models, data, compute, and talent — without dependency on foreign vendors or infrastructure. Sovereign AI is the geopolitical framing of the AI race, driving over $100 billion in government-backed investments worldwide.
Sparse model: A neural network where only a fraction of parameters are activated for any given input, reducing compute requirements compared to dense models of the same total size. Mixture of Experts is the most common sparse architecture.
Speculative decoding: An inference optimization where a small, fast 'draft' model generates candidate tokens that a larger 'verifier' model checks in parallel, speeding up generation without changing output quality.
Structured Outputs: A model capability that guarantees the AI's response conforms to a specific schema — JSON, XML, or any developer-defined format — rather than freeform text. Structured outputs eliminate the need for fragile regex parsing and make LLM responses directly consumable by downstream code.
Synthetic data: Artificially generated training data created by AI models or simulations. Synthetic data is increasingly used when real data is scarce, private, or expensive, but quality and diversity remain open challenges.
System prompt: A set of instructions prepended to every conversation that defines the AI model's persona, constraints, and behavior. System prompts are how companies customize foundation models for specific products and brands.
Test-time compute: The practice of allocating additional compute during inference to improve output quality, rather than relying solely on the capabilities baked in during training. Reasoning models and extended thinking are the primary examples of test-time compute scaling.
Throughput: The number of tokens or requests an AI system can process per second. High throughput is essential for batch processing, high-traffic applications, and cost-efficient inference at scale.
Token: The basic unit of text that AI models process, roughly equivalent to 3/4 of a word in English. Tokens are how models read, price, and limit input and output, making token efficiency a key cost lever.
Token pricing: The cost model used by AI API providers, charging per million input and output tokens. Prices have fallen dramatically, from $60/M tokens (GPT-4, 2023) to under $1/M tokens for many models in 2026.
Tokenizer: The algorithm that splits text into tokens before a model can process it. Different models use different tokenizers, which affects how efficiently they handle various languages, code, and specialized content.
Tool use: The ability of an AI model to invoke external tools, such as web search, code execution, or database queries, to augment its capabilities. Tool use transforms models from knowledge stores into action-taking agents.
TPU (Tensor Processing Unit): Google's custom AI accelerator chip, designed specifically for neural network workloads. TPUs power Google's internal AI training and are available via Google Cloud, competing with NVIDIA's GPU ecosystem.
Training: The process of teaching a neural network by feeding it data and adjusting its parameters to minimize prediction errors. Training frontier models now costs $100M+ and takes months on thousands of GPUs.
Transfer learning: Using knowledge learned from one task or dataset to improve performance on a different but related task. Transfer learning is why pre-trained foundation models can be fine-tuned for specialized applications.
Transformer: The neural network architecture behind virtually all modern language and multi-modal models. Introduced in Google's 2017 'Attention Is All You Need' paper, transformers use self-attention to process sequences in parallel.
Vector: An ordered list of numbers that represents data in a high-dimensional space. In AI, vectors (embeddings) encode semantic meaning, enabling similarity search, clustering, and retrieval-augmented generation.
Vector database: A database optimized for storing and querying high-dimensional vectors (embeddings). Vector databases like Pinecone, Weaviate, and Chroma are critical infrastructure for RAG, search, and recommendation systems.
Vertical AI: An AI product purpose-built for a specific industry, such as legal, healthcare, or finance. Vertical AI startups compete on domain expertise and data moats rather than raw model capability.
Vibe coding: A development approach where a programmer describes what they want in natural language and iterates with an AI coding assistant to produce working software, relying on the AI for implementation details rather than writing code line by line.
Watermarking: Techniques for embedding invisible signals in AI-generated content to identify its origin. Watermarking is a key tool for combating deepfakes and meeting emerging regulatory requirements around AI disclosure.
Weights: The numerical values inside a neural network that encode everything the model has learned during training. Model weights are the core intellectual property of AI companies and the subject of intense open-vs.-closed debates.
World model: An internal representation of how the world works that an AI system uses to predict outcomes, plan actions, and reason about physical or causal relationships. World models are considered essential for achieving general intelligence and advanced robotics.
Zero-shot prompting: Asking a model to perform a task with no examples, relying entirely on its pre-trained knowledge and instruction-following ability. Zero-shot capability is a key measure of model generality and usability.

EfficientAnthropic

Claude Haiku 3.5

Anthropic's fastest and most cost-effective model, optimized for near-instant responses at high throughput. Claude Haiku 3.5 is designed for latency-sensitive workloads like classification, extraction, and real-time chat.

200K tokens$0.80/M input, $4/M outputtext · image · code

FrontierAnthropic

Claude Opus 4

Anthropic's most capable model, built for complex reasoning, extended thinking, and tasks that demand sustained multi-step analysis. Claude Opus 4 sets a new bar for frontier intelligence with a 200K context window.

200K tokens$15/M input, $75/M outputtext · image · code

FrontierAnthropic

Claude Sonnet 4

Anthropic's best model for coding and balanced high-capability tasks. Claude Sonnet 4 offers near-Opus intelligence at significantly lower cost and latency, making it the default choice for most production workloads.

200K tokens$3/M input, $15/M outputtext · image · code

SpecializedMistral

Codestral

Mistral's dedicated code generation model, purpose-built for code completion, generation, and understanding across 80+ programming languages. Codestral offers a 32K context window tuned specifically for software engineering workflows.

32K tokenscode

Open WeightCohere

Command R+

Cohere's enterprise-grade language model optimized for retrieval-augmented generation (RAG), tool use, and multilingual business applications. Command R+ is designed to ground responses in enterprise data with high accuracy.

128K tokenstext

SpecializedOpenAI

DALL-E 3

OpenAI's latest image generation model, capable of creating highly detailed images from natural language prompts. DALL-E 3 is natively integrated with ChatGPT and offers significant improvements in prompt adherence over its predecessor.

Text prompt up to 4,000 characters$0.040/image (standard), $0.080/image (HD)image

Open WeightDeepSeek

DeepSeek R1

DeepSeek's open-weight reasoning model that uses chain-of-thought to rival frontier closed-source models on math, science, and coding benchmarks. R1 demonstrated that open models can match proprietary reasoning capabilities.

128K tokenstext · code

Open WeightDeepSeek

DeepSeek V3

A 671B parameter mixture-of-experts model that matches or exceeds many frontier closed-source models while remaining fully open-weight. DeepSeek V3 was trained at a fraction of the cost of comparable Western models.

128K tokenstext · code

SpecializedElevenLabs

ElevenLabs

The leading AI voice synthesis platform offering ultra-realistic text-to-speech, voice cloning from short samples, and multilingual voice generation. ElevenLabs powers voice content for media companies, game studios, and accessibility tools.

Up to 100K characters per requestFrom $5/mo (30 min); enterprise customaudio · text

EfficientGoogle

Gemini 2.0 Flash

Google's fast, cost-efficient multimodal model from the Gemini 2.0 generation. Gemini 2.0 Flash processes text, images, audio, and video at high speed, optimized for latency-sensitive applications.

1M tokens$0.10/M input, $0.40/M outputtext · image · audio · video · code

EfficientGoogle

Gemini 2.5 Flash

Google's latest efficient model combining speed, low cost, and built-in thinking capabilities. Gemini 2.5 Flash brings reasoning abilities previously reserved for larger models into a fast, affordable package with a 1M token context.

1M tokens$0.15/M input, $0.60/M output (non-thinking)text · image · audio · video · code

FrontierGoogle

Gemini 2.5 Pro

Google's most capable AI model, featuring a 1M token context window, native multimodal understanding, and advanced thinking capabilities. Gemini 2.5 Pro leads multiple benchmarks across reasoning, math, science, and coding.

1M tokens$1.25/M input, $10/M output (under 200K)text · image · audio · video · code

Open WeightGoogle

Gemma 3

Google's open-weight 27B parameter model designed to run on a single GPU or even consumer hardware. Gemma 3 delivers surprisingly strong performance for its size, with multimodal vision capability and 128K context.

128K tokenstext · image · code

FrontierOpenAI

GPT-4.1

OpenAI's latest flagship model optimized for coding, instruction-following, and long-context reasoning. GPT-4.1 offers a 1M token context window and significant improvements in structured output and complex task completion.

1M tokens$2/M input, $8/M outputtext · image · code

FrontierOpenAI

GPT-4o

OpenAI's omni-modal flagship model capable of processing and generating text, images, and audio natively. GPT-4o delivers GPT-4-class intelligence at significantly faster speeds and lower costs.

128K tokens$2.50/M input, $10/M outputtext · image · audio · code

EfficientOpenAI

GPT-4o mini

OpenAI's most cost-effective model, offering strong performance across text and vision tasks at a fraction of GPT-4o's cost. GPT-4o mini is optimized for high-volume, low-latency applications.

128K tokens$0.15/M input, $0.60/M outputtext · image · code

FrontierxAI

Grok 3

xAI's frontier model with real-time access to X (Twitter) data, trained on the Colossus supercomputer. Grok 3 combines strong reasoning capabilities with unique access to live social media signal.

128K tokens$3/M input, $15/M outputtext · image · code

Open WeightMeta

Llama 4 Maverick

Meta's 400B parameter mixture-of-experts model with 128 experts, representing a major leap in open-weight model capability. Llama 4 Maverick is designed for complex reasoning and multilingual tasks at scale.

1M tokenstext · image · code

Open WeightMeta

Llama 4 Scout

Meta's 109B parameter MoE model featuring an unprecedented 10M token context window. Llama 4 Scout is optimized for processing extremely long documents and codebases while maintaining strong general capability.

10M tokenstext · image · code

SpecializedMidjourney

Midjourney v7

The latest version of Midjourney's image generation model, widely regarded as producing the highest aesthetic quality among AI image generators. v7 offers improved coherence, photorealism, and fine-grained control.

Text prompt (no formal token limit)From $10/mo (Basic); $30/mo (Standard)image

Open WeightMistral

Mistral Large 2

Mistral's 123B parameter flagship model positioned as Europe's leading frontier AI. Mistral Large 2 offers strong multilingual performance, long context, and enterprise-focused features with open-weight availability.

128K tokenstext · code

Open WeightMistral

Mistral Small

Mistral's efficient model optimized for fast, cost-effective inference while maintaining strong performance on core tasks. Mistral Small is designed for high-volume production deployments where speed and cost matter more than peak capability.

32K tokens$0.10/M input, $0.30/M outputtext · code

FrontierOpenAI

o3

OpenAI's most powerful reasoning model, designed for PhD-level science, advanced mathematics, and complex multi-step problem solving. o3 uses extended chain-of-thought reasoning to tackle problems that stump conventional models.

200K tokens$10/M input, $40/M outputtext · image · code

FrontierOpenAI

o4-mini

OpenAI's efficient reasoning model that brings chain-of-thought capabilities to a smaller, faster, and cheaper package. o4-mini is designed for applications that need reasoning but cannot tolerate the cost or latency of o3.

200K tokens$1.10/M input, $4.40/M outputtext · image · code

Open WeightMicrosoft

Phi-4

Microsoft Research's small language model that consistently outperforms models several times its size. Phi-4 demonstrates that high-quality training data curation can compensate for parameter count, achieving strong results with only 14B parameters.

16K tokenstext · code

Open WeightAlibaba

Qwen 3

Alibaba's flagship 235B MoE language model with exceptional multilingual capabilities across 100+ languages. Qwen 3 offers both thinking and non-thinking modes, with open weights available for the full model family.

128K tokenstext · code

SpecializedOpenAI

Sora

OpenAI's video generation model capable of creating realistic and imaginative video clips from text prompts. Sora can generate videos up to 20 seconds long with detailed scenes, complex camera motion, and multiple characters.

Text prompt up to 200 wordsIncluded with ChatGPT Plus ($20/mo) and Pro ($200/mo)video · image

SpecializedStability AI

Stable Diffusion 3.5

Stability AI's latest open-weight image generation model offering high-quality image synthesis with full local deployment capability. SD 3.5 features improved text rendering, composition, and fine-grained control via its MMDiT architecture.

Text prompt (77 CLIP tokens + T5 encoder)image

SpecializedSuno

Suno v4

Suno's AI music generation platform that creates full songs — lyrics, vocals, and instrumentation — from text prompts. v4 delivers broadcast-quality output across genres with improved coherence and musical structure.

Text prompt (lyrics + style description)Free tier (10 songs/day); Pro $10/mo; Premier $30/moaudio

SpecializedOpenAI

Whisper v3

OpenAI's open-source speech recognition model supporting transcription and translation across 100+ languages. Whisper v3 delivers near-human accuracy on diverse audio inputs including accented speech, background noise, and technical jargon.

30 seconds audio chunks (unlimited via streaming)$0.006/minute via OpenAI API; free self-hostedaudio · text

Know the terms. Know the moves.

Get the 5 AI stories that matter every Friday — free.

Free forever. No spam.

Explore more

Dive into the latest AI stories and news.