Learning path

How models learned to remember more

The story of long context in AI — from the architectural foundations that make memory possible, to the labs and models that have pushed the boundaries of how much a model can hold in mind at once. This path starts with the core ideas and works outward to the real-world systems built on them.

Whether you're new to AI or already familiar with the basics, this path traces a clear line from "what makes context hard" to "who solved it and how."

Mixed level9 steps~42 min

9 steps

Begin →

Transformers
Start here: the Transformer architecture is the foundation that defines what a context window is and why scaling it is non-trivial.
Read →Beginner In-depth
Anthropic
Anthropic has been one of the most aggressive pushers of long-context capability — understanding their research direction sets up why the Claude model line looks the way it does.
Read →Beginner In-depth
Claude
The Claude model family is a concrete case study in long-context deployment, from early 100K-token windows to today's extended limits.
Read →Beginner In-depth
Claude Code
Claude Code shows long context in action — agentic coding tasks that depend on holding entire codebases in the window at once.
Read →Beginner In-depth
OpenAI
OpenAI's trajectory — from GPT-4's 8K window to GPT-5.5 — illustrates how the industry's leading lab has approached the same long-context challenge from a different angle.
Read →Beginner In-depth
GPT-5.5
GPT-5.5 is OpenAI's current flagship and a reference point for where long-context capability stands at the frontier today.
Read →Beginner In-depth
DeepSeek V4
DeepSeek V4 rounds out the picture with an open-weight perspective on long context, showing how the capability has spread beyond closed labs.
Read →Beginner In-depth
Mistral AI
Mistral AI's efficient, open-weight models offer a contrasting design philosophy — achieving competitive context lengths with a focus on efficiency.
Read →Beginner In-depth
Hugging Face
Hugging Face is where most long-context open models land for public use — understanding the platform completes the picture of how these capabilities reach developers.
Read →Beginner In-depth

How models learned to remember more

Whether you're new to AI or already familiar with the basics, this path traces a clear line from "what makes context hard" to "who solved it and how."

Mixed level9 steps~42 min

Transformers

Anthropic

Claude

Claude Code

OpenAI

GPT-5.5

DeepSeek V4

Mistral AI

Hugging Face