Encoder-decoder
- Definition
- A neural network architecture where an encoder compresses input into a representation and a decoder generates output from it. T5 and BART use this pattern, contrasting with decoder-only models like GPT.
- Why it matters
- Understanding encoder-decoder versus decoder-only architectures helps you choose the right model for your task. Encoder-decoder models excel at tasks where you transform one sequence into another: translation, summarization, code transpilation. Decoder-only models (GPT, Claude, Llama) excel at open-ended generation and instruction following. The industry has largely consolidated around decoder-only for general-purpose LLMs, but encoder-decoder models remain superior for specific tasks and are more parameter-efficient for translation and structured output. For specialized production systems, knowing which architecture fits your task can save significant compute.
- In practice
- Google's T5 and FLAN-T5 remain popular encoder-decoder models for structured NLP tasks. Meta's NLLB (No Language Left Behind) uses an encoder-decoder architecture for translation across 200 languages. Whisper, OpenAI's speech recognition model, uses an encoder-decoder design where the encoder processes audio and the decoder generates text. However, the trend is clearly toward decoder-only: Claude, GPT-4, Gemini, and Llama are all decoder-only. The encoder-decoder architecture survives primarily in specialized applications where its efficiency advantages outweigh the benefits of using a general-purpose decoder-only model.
We cover models & architecture every week.
Get the 5 AI stories that matter — free, every Friday.
Related terms
Transformer
The neural network architecture behind virtually all modern language and multi-modal models. Introduced in Google's 2017 'Attention Is All You Need' paper, transformers use self-attention to process sequences in parallel.
Autoregressive model
A model that generates output one token at a time, with each new token conditioned on all previous tokens. GPT, Claude, and Gemini are all autoregressive, which is why they stream responses word by word.
LLM (Large Language Model)
A neural network trained on massive text corpora to predict and generate language. LLMs like GPT-4, Claude, and Gemini are the foundation of the current AI wave, powering chatbots, coding tools, and enterprise automation.
Know the terms. Know the moves.
Get the 5 AI stories that matter every Friday — free.
Free forever. No spam.