Entity · technique

Prefill/Decode Disaggregation

techniqueactiveprefill-decode-disaggregation-9725723e·2 events·first seen May 18, 2026

Aliases: Prefill/Decode Disaggregation, prefill-decode disaggregation

Co-occurring entities

Hugging Face TNG Technology Consulting Mistral AI Mistral-medium Heaptrack Mathis Felardos NIXL UCX (Unified Communication X)vLLM InfiniBand

More like this (12)

RefDecoder speculative decoding Parallel Decoding Distillation Random Coding DECODEM DecodingTrust near-deduplication blockwise decoding DFM Decoder Positional Encoding Grammar-Constrained Decoding Parallel Box Decoding

Recent events (2)

4Hugging Face Blog·May 19, 2026·source ↗

Prefill and Decode for Concurrent Requests - Optimizing LLM Performance

This Hugging Face blog post from TNG Technology Consulting examines how prefill and decode phases interact under concurrent request loads in LLM serving systems. It analyzes performance bottlenecks that arise when multiple requests share GPU resources, covering throughput-latency tradeoffs and optimization strategies. The piece targets practitioners deploying LLMs at scale who need to understand scheduling and batching behavior.

Training Infrastructure Inference Economics Prefill/Decode Disaggregation Hugging Face TNG Technology Consulting

6Mistral Ai News·May 18, 2026·source ↗

Mistral AI Engineering Deep Dive: Debugging a Memory Leak in vLLM

Mistral AI's engineering team investigated a memory leak in vLLM that appeared exclusively during disaggregated prefill/decode serving with Mistral Medium 3.1 and graph compilation enabled, causing ~400 MB/min RSS growth. The leak was not visible in heap profilers (Memray, Guppy3, Heaptrack), pointing to off-heap memory allocation tied to NIXL/UCX-based KV cache transfer over InfiniBand. The post is the first in a new Engineering Deep Dive series and documents a methodical descent from Python-level tools to kernel-level tracing to isolate the root cause.

Training Infrastructure Inference Economics Mistral AI Prefill/Decode Disaggregation Mistral-medium +7 more