Models & ArchitectureDeep Dive

Mixture of Agents (MoA)

Source
Definition
An architecture where multiple AI models collaborate on the same task in layers — each model refines or critiques the previous models' outputs before producing a final response. Unlike Mixture of Experts (which routes to sub-networks within one model), Mixture of Agents uses entirely separate models working together as an ensemble.
Why it matters
MoA consistently outperforms any single model on benchmarks, including frontier models — it is proof that ensemble approaches work for LLMs just as they did for classical ML. For CTOs, this means you do not need to bet on one vendor or one model. You can blend models from different providers for better results at lower cost than always reaching for the most expensive option. The strategic implication is significant: if open-source model ensembles can match or beat proprietary frontier models, the moat around any single model provider gets thinner. MoA shifts the competitive advantage from model training to orchestration intelligence.
In practice
Together AI's MoA research (June 2024) demonstrated that layered ensembles of open-source models could match GPT-4o performance. Their AlpacaEval 2.0 leaderboard score of 65.1% beat GPT-4o's 57.5% using a layered approach combining Llama 3 70B, Qwen 1.5 72B, and Mistral models. The architecture works in rounds: layer-1 models each generate an independent response, then layer-2 models receive all layer-1 outputs and synthesize a refined answer. In practice, MoA adds latency (multiple sequential model calls) but the quality gains justify it for high-value tasks. Early adopters are using MoA for content quality scoring, complex analysis, and any task where accuracy matters more than speed.

We cover models & architecture every week.

Get the 5 AI stories that matter — free, every Friday.

Know the terms. Know the moves.

Get the 5 AI stories that matter every Friday — free.

Free forever. No spam.