Almanac
model

Mistral 7B

modelactivemistral-7b-f29b35e3·10 events·first seen 1mo ago

Aliases: Mistral 7B

Co-occurring entities

More like this (12)

Recent events (10)

7Mistral Ai News·15d ago·source ↗

Mistral AI Founding Manifesto and Mistral 7B Release

Mistral AI published its founding mission statement alongside the release of Mistral 7B, a 7-billion-parameter open-weights language model released under Apache 2.0. The model claims to outperform all available open models up to 13B parameters on standard English and code benchmarks, produced in three months from a standing start. The post articulates Mistral's strategic thesis: open-weight models will outcompete proprietary black-box APIs for most enterprise use cases, drawing analogies to Linux, WebKit, and Kubernetes. The company signals intent to release progressively larger frontier models while building a commercial offering around on-premise and VPC deployment.

8Mistral Ai News·15d ago·source ↗

Mistral 7B: Open-Weights 7B Model Outperforming Llama 2 13B

Mistral AI released Mistral 7B, a 7.3B parameter language model under the Apache 2.0 license that outperforms Llama 2 13B across all evaluated benchmarks and approaches Llama 34B on many tasks. The model employs Grouped-Query Attention (GQA) for faster inference and Sliding Window Attention (SWA) to handle longer sequences at reduced cost, achieving roughly 2x speed improvement at 16k sequence length. A fine-tuned chat variant, Mistral 7B Instruct, outperforms all 7B chat models on MT-Bench and is competitive with 13B-class chat models. The release includes deployment support for AWS, GCP, Azure, HuggingFace, and local use via vLLM.

5Hugging Face Blog·28d ago·source ↗

WWDC 24: Running Mistral 7B with Core ML

This Hugging Face blog post covers running Mistral 7B on Apple devices using Core ML, likely demonstrated or announced around WWDC 2024. It addresses on-device inference of a 7B parameter open-weights model using Apple's ML framework. This represents a practical deployment pattern for running capable open-weights LLMs locally on Apple Silicon hardware.

6Mistral Ai News·15d ago·source ↗

Mistral AI Launches Model Customization Suite: Open-Source SDK, Managed Fine-Tuning, and Custom Training

Mistral AI has introduced three tiers of model customization on la Plateforme: an open-source LoRA-based fine-tuning SDK (mistral-finetune) for self-hosted use, serverless managed fine-tuning services via API initially supporting Mistral 7B and Mistral Small, and bespoke custom training services including continuous pretraining for enterprise customers. The managed fine-tuning uses LoRA adapters and claims cost and efficiency advantages over full fine-tuning while maintaining comparable performance. This positions Mistral as a full-stack customization provider competing with OpenAI's fine-tuning API and similar offerings.

6Mistral Ai News·15d ago·source ↗

Mistral AI Releases Mathstral 7B: Math-Specialized Model with SOTA Reasoning in Size Category

Mistral AI has released Mathstral 7B, a math and STEM-specialized model built on Mistral 7B, developed in collaboration with Project Numina. The model achieves 56.6% on MATH and 63.47% on MMLU in standard evaluation, improving to 74.59% on MATH with a reward model over 64 candidates using inference-time compute scaling. Weights are open on HuggingFace and compatible with mistral-inference and mistral-finetune tooling.

3Hugging Face Blog·28d ago·source ↗

Comparing RoBERTa, Llama 2, and Mistral for Sequence Classification via LoRA on Disaster Tweets

A Hugging Face blog post benchmarks three models—RoBERTa, Llama 2, and Mistral—on a disaster tweet classification task using LoRA fine-tuning. The analysis compares parameter-efficient adaptation of encoder-only versus decoder-only architectures for a practical NLP classification problem. Results provide practitioners with guidance on model selection and LoRA configuration for sequence classification.

6Qwen Research·1mo ago·source ↗

Qwen1.5-MoE: Matching 7B Model Performance with 1/3 Activated Parameters

Alibaba's Qwen team releases Qwen1.5-MoE-A2.7B, a mixture-of-experts model with only 2.7 billion activated parameters that claims performance parity with 7B dense models such as Mistral 7B and Qwen1.5-7B. The model activates roughly one-third of its total parameters during inference, offering significant compute efficiency gains. This release follows growing industry interest in MoE architectures sparked by Mixtral, and the model is available on GitHub, HuggingFace, and ModelScope.

7Mistral Ai News·15d ago·source ↗

Mistral AI Releases Ministral 3B and 8B Edge Models

Mistral AI has introduced two new small language models, Ministral 3B and Ministral 8B, targeting on-device and edge computing use cases. Both models support up to 128k context length and claim state-of-the-art performance in the sub-10B parameter category, outperforming comparable models from Google and Meta on internal benchmarks. Ministral 8B features an interleaved sliding-window attention mechanism for memory-efficient inference and is priced at $0.1/M tokens via API, while Ministral 3B is priced at $0.04/M tokens. Weights for Ministral 8B Instruct are available for research use, with commercial licensing available on request.

7Mistral Ai News·15d ago·source ↗

Mistral NeMo: 12B Open-Weights Model with 128k Context, Built with NVIDIA

Mistral AI and NVIDIA jointly release Mistral NeMo, a 12B parameter model under Apache 2.0 license featuring a 128k token context window and a new tokenizer called Tekken based on Tiktoken. The model is designed as a drop-in replacement for Mistral 7B, supports multilingual applications across 11+ languages, and was trained with quantization awareness enabling FP8 inference without performance loss. Benchmark comparisons show competitive performance against Gemma 2 9B and Llama 3 8B. Weights are available on HuggingFace and the model is also packaged as an NVIDIA NIM inference microservice.

8Mistral Ai News·15d ago·source ↗

Mistral AI Releases Mixtral 8x22B Under Apache 2.0

Mistral AI has released Mixtral 8x22B, a sparse Mixture-of-Experts model with 141B total parameters but only 39B active parameters, under the permissive Apache 2.0 license. The model features a 64K token context window, native function calling, multilingual support across five European languages, and strong math and coding performance. Mistral claims it outperforms all other open-weight models on standard benchmarks while being faster than dense 70B models due to sparse activation. An instructed version achieves 90.8% on GSM8K maj@8.