Almanac

Learning path

Alignment and RLHF: from first principles to frontier techniques

How do you take a raw language model and make it helpful, honest, and safe? This path traces the full arc — from the basics of reinforcement learning to the specific algorithms (RLHF, PPO, DPO, GRPO) that shape model behavior, and the labs and ideas pushing the frontier of alignment research. Take the steps in order; each one builds the vocabulary the next one needs.

Mixed level9 steps~62 min

9 steps

Begin →
  1. Reinforcement Learning

    Start here: reinforcement learning is the optimization framework that all the alignment algorithms below are built on — you need this vocabulary first.

  2. Reinforcement Learning from Human Feedback

    The core idea of the path: how human preference signals are turned into a training reward that steers a model toward helpful behavior.

  3. PPO

    The workhorse optimizer inside classic RLHF pipelines — understanding PPO explains why RLHF training is stable but expensive.

  4. Direct Preference Optimization (DPO)

    DPO reframes preference learning as a simpler supervised problem, sidestepping the reward model — a direct contrast to the PPO approach you just read.

  5. GRPO

    A newer group-relative policy method that improves on PPO's efficiency — the current frontier of RL-based alignment training.

  6. Chain-of-Thought Reasoning

    Explicit reasoning traces give alignment techniques a richer signal to work with — and are central to how modern aligned models explain themselves.

  7. scalable oversight

    As models grow more capable, human feedback alone may not be enough — scalable oversight addresses how to supervise models that exceed human ability in a domain.

  8. Anthropic

    Anthropic is the lab most publicly focused on alignment research — grounding the techniques above in who is actually building and deploying them.

  9. OpenAI

    OpenAI originated RLHF as a practical alignment tool and continues to shape the field — a second institutional lens to close the path.