Entity · paper

End-to-End Context Compression at Scale

paperactiveend-to-end-context-compression-at-scale-0b48ce35·1 events·first seen Jun 9, 2026

Aliases: End-to-End Context Compression at Scale

Co-occurring entities

More like this (12)

Context-Driven Incremental Compression gradient compression Prompt Compression via Activation Aggregation thought compression Braun et al. 2025 Compressed Computation Planning-aligned Token Compression for Long-Context Autonomous Driving Context-Driven Incremental Compression for Multi-Turn Dialogue Generation Incremental Context Expansion Requential Coding: Pushing the Limits of Model Compression with Self-Generated Training Data visual-token compression NNCF (Neural Network Compression Framework)context compaction

Recent events (1)

7arXiv · cs.CL·Jun 9, 2026·source ↗

Latent Context Language Models (LCLMs) achieve competitive encoder-decoder KV cache compression at scale

Researchers introduce Latent Context Language Models (LCLMs), a family of encoder-decoder compressors that map long token sequences to shorter latent embeddings consumed by a decoder, targeting the KV cache memory bottleneck in long-context inference. The authors conduct architecture search and continually pre-train 0.6B-encoder/4B-decoder models on over 350B tokens at compression ratios of 1:4, 1:8, and 1:16. LCLMs improve the Pareto frontier across general-task performance, compression speed, and peak memory, and are demonstrated as efficient backbones for long-horizon agents that can skim compressed context and expand relevant segments on demand. The work closes a previously noted gap between encoder-decoder approaches and KV cache compression methods on the accuracy-efficiency frontier.

Long Context Evolution Inference Economics End-to-End Context Compression at Scale Latent Context Language Models +1 more