Entity · technique

Sliding Window Attention

techniqueactivesliding-window-attention-c8a1559a·3 events·first seen Jun 1, 2026

Aliases: Sliding Window Attention, sliding-window attention

Co-occurring entities

More like this (12)

NLL-Guided Full-Attention Layer Selection for Training-Free Sliding-Window Adaptation Set Attention Block Graph Convolutional Attention Spectral Attention bidirectional attention Differential Attention Cross-Layer Sparse Attention MiniMax Sparse Attention Debiased One-Pass Attention Sorting Block Sparse Attention ProbSparse Attention sparse attention

Recent events (3)

6The Batch·Jun 1, 2026·source ↗

Test-Time Training End-to-End (TTT-E2E) Retrains Model Weights to Handle Long Inputs

Researchers from Astera Institute, Nvidia, Stanford, UC Berkeley, and UC San Diego introduced TTT-E2E, a method that compresses long context into transformer weights by training the model during inference via meta-learning. The approach uses sliding-window attention restricted to 8,000 tokens and updates only the fully connected layers of the last quarter of the network on each 1,000-token chunk at inference time, keeping per-token generation latency roughly constant as context scales to 128,000 tokens. TTT-E2E slightly outperforms vanilla transformers on next-token prediction loss across long contexts and matches efficient architectures like Mamba 2 and Gated DeltaNet on inference speed, but fails dramatically on Needle-in-a-Haystack retrieval beyond 8,000 tokens and incurs substantially higher training latency. The work reframes long-context handling as a training-inference trade-off rather than an architectural design problem.

Training Infrastructure Long Context Evolution University of California San Diego Mamba Stanford University +13 more

7Mistral Ai News·Jun 1, 2026·source ↗

Mistral AI Releases Ministral 3B and 8B Edge Models

Mistral AI has introduced two new small language models, Ministral 3B and Ministral 8B, targeting on-device and edge computing use cases. Both models support up to 128k context length and claim state-of-the-art performance in the sub-10B parameter category, outperforming comparable models from Google and Meta on internal benchmarks. Ministral 8B features an interleaved sliding-window attention mechanism for memory-efficient inference and is priced at $0.1/M tokens via API, while Ministral 3B is priced at $0.04/M tokens. Weights for Ministral 8B Instruct are available for research use, with commercial licensing available on request.

Long Context Evolution Frontier Model Releases Mistral AI Gemma 2 9B Ministral 8B +12 more

8Mistral Ai News·Jun 1, 2026·source ↗

Mistral 7B: Open-Weights 7B Model Outperforming Llama 2 13B

Mistral AI released Mistral 7B, a 7.3B parameter language model under the Apache 2.0 license that outperforms Llama 2 13B across all evaluated benchmarks and approaches Llama 34B on many tasks. The model employs Grouped-Query Attention (GQA) for faster inference and Sliding Window Attention (SWA) to handle longer sequences at reduced cost, achieving roughly 2x speed improvement at 16k sequence length. A fine-tuned chat variant, Mistral 7B Instruct, outperforms all 7B chat models on MT-Bench and is competitive with 13B-class chat models. The release includes deployment support for AWS, GCP, Azure, HuggingFace, and local use via vLLM.

Long Context Evolution Frontier Model Releases Mistral AI MT-Bench Mistral 7B Instruct v0.2 +13 more