Almanac

Learning path

Long Context Evolution

How AI models learned to hold more in mind — from the architectural constraints that made early models forgetful, to the labs and model families pushing context windows into the millions of tokens. This path traces the evolution through the key players and models that drove it, starting with the foundational ideas and ending at the frontier.

Best for readers who want to understand why long context matters and who made it happen, in the order the story actually unfolded.

Mixed level10 steps~46 min

10 steps

Begin →
  1. Mamba

    Start here: Mamba is the alternative architecture that challenged the attention bottleneck head-on, making it the clearest lens for understanding why long context is hard in the first place.

  2. NVIDIA

    The hardware side of the story — NVIDIA's GPUs are the infrastructure constraint that makes scaling context windows expensive, and understanding this grounds every architectural trade-off that follows.

  3. Google DeepMind

    Google DeepMind pioneered some of the most aggressive long-context research, including the Gemini line's million-token windows — a key milestone in what became possible.

  4. Anthropic

    Anthropic's sustained focus on long, reliable context — and the safety questions it raises — made it a defining voice in how the industry thought about what to do with a large window.

  5. Claude

    The Claude model line is where Anthropic's long-context ambitions became concrete — trace how context length grew across versions and what changed with it.

  6. Claude Opus 4.6

    Claude Opus 4.6 represents a recent high-water mark in long-context capability, making it a useful case study in what frontier performance actually looks like today.

  7. OpenAI

    OpenAI's approach to context scaling — including GPT-4's extended windows and the tooling built around them — offers a counterpoint to Anthropic's and Google's strategies.

  8. Mistral AI

    Mistral AI's efficient, open-weight models show how the long-context push spread beyond the largest labs, with competitive windows at far lower compute cost.

  9. DeepSeek V4

    DeepSeek V4 brings the open-weight frontier into view, demonstrating that long-context capability is no longer the exclusive domain of closed, well-resourced labs.

  10. Hugging Face

    Close with Hugging Face — the platform where most of these open-weight long-context models land — to see how the ecosystem makes these advances accessible to practitioners.