InstructGPT
instructgpt-740b4460·2 events·first seen 28d agoAliases: InstructGPT
Co-occurring entities
More like this (12)
Recent events (2)
Aligning language models to follow instructions
OpenAI published a blog post describing their work on aligning language models to follow human instructions, corresponding to the InstructGPT research. This work introduced reinforcement learning from human feedback (RLHF) as a core technique for training models to be more helpful, honest, and aligned with user intent. The approach demonstrated that smaller instruction-tuned models could outperform larger base models on human preference evaluations, marking a foundational shift in how language models are trained and deployed.
Illustrating Reinforcement Learning from Human Feedback (RLHF)
This Hugging Face blog post provides an illustrated overview of Reinforcement Learning from Human Feedback (RLHF), explaining the technique used to align large language models with human preferences. It covers the core pipeline: pretraining a language model, collecting human preference data, training a reward model, and fine-tuning with RL. Published in December 2022, it served as an accessible reference during the period when RLHF was becoming central to frontier model development.