Llama 2 70B
llama-2-70b-a38b563f·4 events·first seen 28d agoAliases: Llama 2 70B
Co-occurring entities
More like this (12)
Recent events (4)
Fine-tuning Llama 2 70B using PyTorch FSDP
This Hugging Face blog post details a practical workflow for fine-tuning the Llama 2 70B model using PyTorch Fully Sharded Data Parallel (FSDP), focusing on RAM-efficient techniques. The guide addresses the memory challenges of training large-scale open-weight models across multiple GPUs. It serves as a technical reference for practitioners working with frontier-scale open models on distributed infrastructure.
Mixtral 8x7B: Mistral AI Releases Sparse Mixture-of-Experts Open-Weight Model
Mistral AI has released Mixtral 8x7B, a sparse mixture-of-experts (SMoE) model with 46.7B total parameters but only 12.9B active parameters per token, enabling inference speed and cost equivalent to a 12.9B model. Licensed under Apache 2.0, Mixtral outperforms Llama 2 70B on most benchmarks and matches or exceeds GPT-3.5, with support for 32k context, five European languages, and strong code generation. An instruction-tuned variant (Mixtral 8x7B Instruct) achieves 8.3 on MT-Bench, claimed best among open-source models at release. The model is deployed behind Mistral's mistral-small API endpoint and supported via vLLM with Megablocks CUDA kernels.
Mistral AI Releases Mistral Large, Claims Second-Best API Model After GPT-4
Mistral AI has released Mistral Large, its most capable model to date, claiming second place among API-accessible models behind GPT-4 on standard benchmarks including MMLU, HellaSwag, and coding/math evals. The model features a 32K context window, native fluency in five European languages, function calling, and constrained output mode. Simultaneously, Mistral is launching a new Mistral Small optimized for latency, restructuring its endpoint lineup, and announcing Microsoft Azure as its first major distribution partner. This marks Mistral's first significant commercial partnership and expansion beyond its own infrastructure.
Mistral AI Releases Mixtral 8x22B Under Apache 2.0
Mistral AI has released Mixtral 8x22B, a sparse Mixture-of-Experts model with 141B total parameters but only 39B active parameters, under the permissive Apache 2.0 license. The model features a 64K token context window, native function calling, multilingual support across five European languages, and strong math and coding performance. Mistral claims it outperforms all other open-weight models on standard benchmarks while being faster than dense 70B models due to sparse activation. An instructed version achieves 90.8% on GSM8K maj@8.