Almanac
technique

Grouped-Query Attention

techniqueactiveprovisionalgrouped-query-attention-0861e446·1 events·first seen 15d ago

Aliases: Grouped-Query Attention

Co-occurring entities

More like this (12)

Recent events (1)

8Mistral Ai News·15d ago·source ↗

Mistral 7B: Open-Weights 7B Model Outperforming Llama 2 13B

Mistral AI released Mistral 7B, a 7.3B parameter language model under the Apache 2.0 license that outperforms Llama 2 13B across all evaluated benchmarks and approaches Llama 34B on many tasks. The model employs Grouped-Query Attention (GQA) for faster inference and Sliding Window Attention (SWA) to handle longer sequences at reduced cost, achieving roughly 2x speed improvement at 16k sequence length. A fine-tuned chat variant, Mistral 7B Instruct, outperforms all 7B chat models on MT-Bench and is competitive with 13B-class chat models. The release includes deployment support for AWS, GCP, Azure, HuggingFace, and local use via vLLM.