Mistral AI Demonstrates Pixtral-12B Fine-Tuning on Satellite Imagery via LoRA
Mistral AI published a technical case study showing how fine-tuning Pixtral-12B using LoRA on the Aerial Image Dataset (AID) significantly improves satellite image classification over the base model. The post details the fine-tuning workflow via Mistral's API and LaPlateforme UI, covering hyperparameter selection and structured output enforcement. Key improvements include better handling of ambiguous scene categories (e.g., Playground vs. Stadium) and reduced hallucination of invalid class labels. The article positions domain-specific fine-tuning as a practical bridge between general-purpose vision-language models and specialized geospatial applications.
Related guides (4)
Related events (8)
Mistral AI Launches Model Customization Suite: Open-Source SDK, Managed Fine-Tuning, and Custom Training
Mistral AI has introduced three tiers of model customization on la Plateforme: an open-source LoRA-based fine-tuning SDK (mistral-finetune) for self-hosted use, serverless managed fine-tuning services via API initially supporting Mistral 7B and Mistral Small, and bespoke custom training services including continuous pretraining for enterprise customers. The managed fine-tuning uses LoRA adapters and claims cost and efficiency advantages over full fine-tuning while maintaining comparable performance. This positions Mistral as a full-stack customization provider competing with OpenAI's fine-tuning API and similar offerings.
Mistral AI Releases Mistral Small v24.09, Free API Tier, and Pixtral 12B Vision on le Chat with Broad Price Cuts
Mistral AI announced a multi-part release on September 17, 2024: a free tier for la Plateforme API, significant price reductions across its model family (up to 80% for Mistral Small and Codestral), an updated Mistral Small v24.09 (22B parameters, improved alignment and reasoning), and the availability of Pixtral 12B vision capabilities on le Chat. Pixtral 12B, released under Apache 2.0, supports images of any size without text performance degradation and is now accessible for free on le Chat. The pricing updates also apply to cloud partner deployments on Azure AI Studio, Amazon Bedrock, and Google Vertex AI.
Pixtral Large: Mistral AI's 124B Open-Weights Multimodal Model
Mistral AI released Pixtral Large, a 124B open-weights multimodal model built on Mistral Large 2, featuring a 1B parameter vision encoder and 128K context window supporting at least 30 high-resolution images. The model claims state-of-the-art results on MathVista, DocVQA, and ChartQA, outperforming GPT-4o and Gemini-1.5 Pro on several benchmarks, and leads the LMSys Vision Leaderboard among open-weights models by ~50 ELO points. Simultaneously, Mistral updated its text model to Mistral Large 24.11 with improvements in long-context understanding, function calling, and RAG/agentic workflows. Note: the model has since been deprecated and replaced by newer Mistral vision models.
Pixtral 12B: Mistral AI's First Multimodal Model (Now Deprecated)
Mistral AI released Pixtral 12B in September 2024 as their first natively multimodal model, combining a new 400M parameter vision encoder trained from scratch with a 12B multimodal decoder based on Mistral Nemo. The model supports variable image sizes and aspect ratios, a 128K token context window for multiple images, and achieved 52.5% on MMMU while maintaining strong text-only benchmark performance. The model is now deprecated and has been replaced by newer vision and multimodal models from Mistral. It was released under Apache 2.0 license.
Mistral AI Announces Fine-Tuning for Flagship Models, Agents Alpha, and SDK 1.0
Mistral AI has announced three platform updates: fine-tuning support for all flagship and specialist models on La Plateforme (including Mistral Large 2 and Codestral), an alpha release of an Agents feature enabling custom workflows via Le Chat or API, and a stable 1.0 release of the mistralai Python and TypeScript SDK. Fine-tuning supports base prompts, few-shot prompting, and full fine-tuning with custom datasets. The Agents feature is described as early-stage, with tool and data-source integrations planned.
Mistral Small 4: Unified Multimodal, Reasoning, and Coding MoE Model Released Under Apache 2.0
Mistral AI has released Mistral Small 4, a 119B-parameter Mixture-of-Experts model (6B active per token) that unifies capabilities previously split across Magistral (reasoning), Pixtral (multimodal), and Devstral (coding agents) into a single open-weights model. The model features a 256k context window, configurable reasoning effort via a `reasoning_effort` parameter, native text and image input support, and is released under Apache 2.0. Mistral claims 40% latency reduction and 3x throughput improvement over Mistral Small 3, with benchmark results showing competitive performance against GPT-OSS 120B and Qwen models while producing significantly shorter outputs. The release includes day-0 availability as an NVIDIA NIM and support across vLLM, llama.cpp, SGLang, and Transformers.
Mistral Small 3: 24B Latency-Optimized Open-Weight Model Released Under Apache 2.0
Mistral AI has released Mistral Small 3, a 24B-parameter instruction-tuned model optimized for low latency, achieving over 81% on MMLU at 150 tokens/s on a single GPU. The model is competitive with Llama 3.3 70B and Qwen 32B while being more than 3x faster on equivalent hardware, and is released under Apache 2.0 for both pretrained and instruction-tuned checkpoints. It is explicitly not trained with RL or synthetic data, positioning it as a base model for community fine-tuning and reasoning capability development. Deployment targets include local inference on consumer hardware (RTX 4090, MacBook 32GB RAM), agentic function calling, and domain-specific fine-tuning.
Mistral AI Releases Mathstral 7B: Math-Specialized Model with SOTA Reasoning in Size Category
Mistral AI has released Mathstral 7B, a math and STEM-specialized model built on Mistral 7B, developed in collaboration with Project Numina. The model achieves 56.6% on MATH and 63.47% on MMLU in standard evaluation, improving to 74.59% on MATH with a reward model over 64 candidates using inference-time compute scaling. Weights are open on HuggingFace and compatible with mistral-inference and mistral-finetune tooling.



