Almanac

Learning path

Inference Economics: Who Serves AI, and at What Cost

Running a model is not free — every token generated costs compute, and the economics of who runs models, how, and for whom now shapes the entire AI industry. This path traces the inference supply chain: from the hardware layer up through cloud providers, model labs, and the open-weight ecosystem that is quietly rewriting the cost curve.

Aimed at readers who want to understand the business and technical forces behind AI pricing, availability, and competition. No deep ML background required, but familiarity with what a language model is will help.

Mixed level9 steps~52 min

9 steps

Begin →
  1. NVIDIA

    Start here: NVIDIA's GPUs are the physical substrate on which almost all inference runs, so understanding this layer sets the cost floor for everything above it.

  2. Amazon Web Services

    Cloud providers like AWS are the primary way most organizations access that GPU capacity — this step explains the infrastructure-as-a-service layer that sits between the chip and the model.

  3. Hugging Face

    Hugging Face is the central marketplace for open-weight models and inference APIs, making it the clearest window into how model distribution and self-hosted inference actually work.

  4. Mistral AI

    Mistral AI is the sharpest example of a lab competing on inference efficiency — small, capable open-weight models designed to be cheap to run.

  5. DeepSeek V4

    DeepSeek V4 is the landmark case study in aggressive cost reduction — its release forced a public reckoning with how much inference should cost.

  6. OpenAI

    OpenAI sits at the premium end of the market — understanding its pricing and API strategy shows what labs charge when they lead on capability.

  7. Anthropic

    Anthropic's approach — safety-focused, enterprise-oriented, with tiered model offerings — illustrates a different commercial inference strategy from OpenAI's.

  8. GPT-5.5

    GPT-5.5 is a concrete example of frontier-model inference economics in practice — what the most capable models cost to serve and who pays for them.

  9. Claude Code

    Claude Code shows how inference economics play out in a high-usage agentic product — long contexts and multi-step tasks stress-test per-token pricing in a real deployment.