Model WarsJanuary 27, 2022via OpenAI Blog
Aligning language models to follow instructions
Why it matters
OpenAI deployed a new generation of instruction-tuned models via RLHF alignment techniques, establishing a new baseline for user-intent adherence and safety. This shift from GPT-3 to InstructGPT as the API default signals a market-wide move toward alignment-first model development.
Key signals
- InstructGPT trained with human-in-the-loop RLHF alignment
- Outperforms GPT-3 on instruction-following
- Improved truthfulness and reduced toxicity vs. GPT-3
- Deployed as default language model on OpenAI API
- Published Jan 27, 2022
The hook
OpenAI just replaced GPT-3. InstructGPT is now the default on their API—better at following instructions, more truthful, less toxic.
We’ve trained language models that are much better at following user intentions than GPT-3 while also making them more truthful and less toxic, using techniques developed through our alignment research. These InstructGPT models, which are trained with humans in the loop, are now deployed as the default language models on our API.
Relevance score:78/100