Almanac
model

InstructGPT

modelactiveinstructgpt-740b4460·2 events·first seen 28d ago

Aliases: InstructGPT

Co-occurring entities

More like this (12)

Recent events (2)

8Openai Blog·28d ago·source ↗

Aligning language models to follow instructions

OpenAI published a blog post describing their work on aligning language models to follow human instructions, corresponding to the InstructGPT research. This work introduced reinforcement learning from human feedback (RLHF) as a core technique for training models to be more helpful, honest, and aligned with user intent. The approach demonstrated that smaller instruction-tuned models could outperform larger base models on human preference evaluations, marking a foundational shift in how language models are trained and deployed.

5Hugging Face Blog·28d ago·source ↗

Illustrating Reinforcement Learning from Human Feedback (RLHF)

This Hugging Face blog post provides an illustrated overview of Reinforcement Learning from Human Feedback (RLHF), explaining the technique used to align large language models with human preferences. It covers the core pipeline: pretraining a language model, collecting human preference data, training a reward model, and fine-tuning with RL. Published in December 2022, it served as an accessible reference during the period when RLHF was becoming central to frontier model development.