technique
Grouped-Query Attention
techniqueactiveprovisional
grouped-query-attention-0861e446·1 events·first seen 15d agoAliases: Grouped-Query Attention
Co-occurring entities
More like this (12)
Recent events (1)
Mistral 7B: Open-Weights 7B Model Outperforming Llama 2 13B
Mistral AI released Mistral 7B, a 7.3B parameter language model under the Apache 2.0 license that outperforms Llama 2 13B across all evaluated benchmarks and approaches Llama 34B on many tasks. The model employs Grouped-Query Attention (GQA) for faster inference and Sliding Window Attention (SWA) to handle longer sequences at reduced cost, achieving roughly 2x speed improvement at 16k sequence length. A fine-tuned chat variant, Mistral 7B Instruct, outperforms all 7B chat models on MT-Bench and is competitive with 13B-class chat models. The release includes deployment support for AWS, GCP, Azure, HuggingFace, and local use via vLLM.