Learning path
How models learned to think: chain-of-thought, RL on verifiable rewards, and the reasoning frontier
This path traces the arc from a simple prompting trick to a full training paradigm — the story of how AI models went from pattern-matching to deliberate, step-by-step reasoning. It covers the core ideas (chain-of-thought, RL algorithms, verifiable rewards) before landing on the frontier models that embody them today.
Aimed at readers who know what a language model is and want to understand why reasoning models work the way they do — and what it took to get here.
In-depth7 steps~42 min
7 steps