Mixture of Agents (MoA)
- Definition
- An architecture where multiple AI models collaborate on the same task in layers — each model refines or critiques the previous models' outputs before producing a final response. Unlike Mixture of Experts (which routes to sub-networks within one model), Mixture of Agents uses entirely separate models working together as an ensemble.
- Why it matters
- MoA consistently outperforms any single model on benchmarks, including frontier models — it is proof that ensemble approaches work for LLMs just as they did for classical ML. For CTOs, this means you do not need to bet on one vendor or one model. You can blend models from different providers for better results at lower cost than always reaching for the most expensive option. The strategic implication is significant: if open-source model ensembles can match or beat proprietary frontier models, the moat around any single model provider gets thinner. MoA shifts the competitive advantage from model training to orchestration intelligence.
- In practice
- Together AI's MoA research (June 2024) demonstrated that layered ensembles of open-source models could match GPT-4o performance. Their AlpacaEval 2.0 leaderboard score of 65.1% beat GPT-4o's 57.5% using a layered approach combining Llama 3 70B, Qwen 1.5 72B, and Mistral models. The architecture works in rounds: layer-1 models each generate an independent response, then layer-2 models receive all layer-1 outputs and synthesize a refined answer. In practice, MoA adds latency (multiple sequential model calls) but the quality gains justify it for high-value tasks. Early adopters are using MoA for content quality scoring, complex analysis, and any task where accuracy matters more than speed.
We cover models & architecture every week.
Get the 5 AI stories that matter — free, every Friday.
Related terms
Mixture of Experts (MoE)
A model architecture that routes each input to a subset of specialized sub-networks (experts) rather than using the full model. MoE dramatically reduces inference cost while maintaining quality, and is used in GPT-4 and Mixtral.
Model Routing
A system that dynamically selects which AI model should handle each request based on the query's complexity, cost constraints, latency requirements, or domain. Model routing lets applications use expensive frontier models only when needed and cheap efficient models for everything else.
Open-source AI
AI models released with open weights and (sometimes) training data, allowing anyone to use, modify, and deploy them. Meta's Llama and Mistral's models lead the open-source wave, competing with closed models from OpenAI and Anthropic.
Frontier model
The most capable AI model available at any given time, representing the current state of the art. Frontier models push the boundaries of what AI can do and are typically the most expensive to train and run.
Know the terms. Know the moves.
Get the 5 AI stories that matter every Friday — free.
Free forever. No spam.