Entity · technique

Grouped-Query Attention

techniqueactivegrouped-query-attention-0861e446·1 events·first seen Jun 1, 2026

Aliases: Grouped-Query Attention

Co-occurring entities

Mistral AI MT-Bench Mistral 7B Instruct v0.2 CodeLlama 7B Sliding Window Attention FlashAttention-3 Llama 2 Mistral 7B CoreWeave MMLU HuggingFace vLLM

More like this (12)

positional attention heads Set Attention Block Graph Attention Network Functional Attention Lightning Attention global attention Debiased One-Pass Attention Sorting Locality-Sensitive Hashing Attention ProbSparse Attention Differential Attention reference attention Lie-Algebra Attention

Recent events (1)

8Mistral Ai News·Jun 1, 2026·source ↗

Mistral 7B: Open-Weights 7B Model Outperforming Llama 2 13B

Mistral AI released Mistral 7B, a 7.3B parameter language model under the Apache 2.0 license that outperforms Llama 2 13B across all evaluated benchmarks and approaches Llama 34B on many tasks. The model employs Grouped-Query Attention (GQA) for faster inference and Sliding Window Attention (SWA) to handle longer sequences at reduced cost, achieving roughly 2x speed improvement at 16k sequence length. A fine-tuned chat variant, Mistral 7B Instruct, outperforms all 7B chat models on MT-Bench and is competitive with 13B-class chat models. The release includes deployment support for AWS, GCP, Azure, HuggingFace, and local use via vLLM.

Long Context Evolution Frontier Model Releases Mistral AI MT-Bench Mistral 7B Instruct v0.2 +13 more