Entity · model

Qwen2.5-VL

modelactiveqwen2-5-vl-539281ca·7 events·first seen May 18, 2026

Aliases: Qwen2.5-VL, Qwen2.5 VL, Qwen2-VL, Qwen2-VL 7B

Co-occurring entities

More like this (12)

Qwen-2.5-VL-3B Qwen-VL Qwen2.5-VL-72B Qwen2.5 Qwen-3-VL-2B Qwen 2.5-7B Qwen3VL-8B Qwen3VL-8B Qwen2.5-7B Qwen2.5-VL-32B-Instruct Qwen-VL-Max Qwen2.5-3B

Recent events (7)

5arXiv · cs.CL·3d ago·source ↗

Scale vs. quantization tradeoffs for uncertainty signals in vision-language models under image degradation

A new arXiv paper evaluates how model scale and 4-bit quantization affect two confidence signals — verbalized confidence and mean token probability — in the Qwen2-VL family across 5,700 predictions under six photographic degradation types. Key findings: scaling from 2B to 7B sharply improves internal uncertainty (AUROC 0.80→0.98) while verbalized confidence remains weak; 4-bit quantization costs little in accuracy (-1.6 pts) but degrades the internal confidence signal (AUROC 0.95→0.80) and collapses verbalized-confidence parse rate from 99% to 64%. The practical recommendation is to prefer a larger quantized model over a smaller full-precision one within a fixed memory budget, and to use error-detection AUROC rather than calibration error as the primary metric.

Evaluation and Benchmarking Inference Economics Qwen2.5-VL Bigger or Cheaper? Scale and Quantization Effects on Uncertainty Signals in Vision-Language Models Under Image Degradation +1 more

5Mistral Ai News·May 18, 2026·source ↗

Pixtral 12B: Mistral AI's First Multimodal Model (Now Deprecated)

Mistral AI released Pixtral 12B in September 2024 as their first natively multimodal model, combining a new 400M parameter vision encoder trained from scratch with a 12B multimodal decoder based on Mistral Nemo. The model supports variable image sizes and aspect ratios, a 128K token context window for multiple images, and achieved 52.5% on MMMU while maintaining strong text-only benchmark performance. The model is now deprecated and has been replaced by newer vision and multimodal models from Mistral. It was released under Apache 2.0 license.

Frontier Model Releases Open Weights Progress Qwen2.5-VL Mistral AI MT-Bench +8 more

7Qwen Research·May 18, 2026·source ↗

Qwen2-VL: Alibaba Releases Latest Vision-Language Model with Extended Video Understanding

Alibaba's Qwen team has released Qwen2-VL, the latest iteration of their vision-language model series built on the Qwen2 foundation. The model claims state-of-the-art performance on visual understanding benchmarks including MathVista, DocVQA, RealWorldQA, and MTVQA. A notable capability is understanding videos exceeding 20 minutes in length for question answering, dialog, and content creation tasks.

Frontier Model Releases Evaluation and Benchmarking Qwen2.5-VL RealWorldQA DocVQA +6 more

8Qwen Research·May 18, 2026·source ↗

Qwen2.5-VL: Alibaba's New Flagship Vision-Language Model Released in 3B/7B/72B Sizes

Alibaba's Qwen team has released Qwen2.5-VL, their new flagship vision-language model, representing a significant upgrade over Qwen2-VL. The release includes both base and instruct variants in three sizes (3B, 7B, 72B), all open-weighted and available on Hugging Face and ModelScope. The 72B instruct model is also accessible via Qwen Chat. Key capabilities highlighted include enhanced visual understanding, with the model positioned as a major step forward in multimodal performance.

Frontier Model Releases Open Weights Progress Qwen2.5-VL Qwen Chat Hugging Face +3 more

7Qwen Research·May 18, 2026·source ↗

Qwen2.5-VL-32B: Reinforcement-Learning-Optimized Vision-Language Model Released

Alibaba's Qwen team has released Qwen2.5-VL-32B-Instruct, a 32-billion-parameter vision-language model built on the Qwen2.5-VL series and further optimized with reinforcement learning. The model is open-sourced under the Apache 2.0 license and available on Hugging Face and ModelScope. It follows the January 2025 launch of the broader Qwen2.5-VL series, positioning the 32B scale as a balance between capability and deployment practicality.

Open Weights Progress Inference Economics Qwen2.5-VL Qwen2.5-VL-32B-Instruct Apache 2.0 +5 more

7Qwen Research·May 18, 2026·source ↗

Qwen VLo: Unified Multimodal Understanding and Generation Model

Alibaba's Qwen team has announced Qwen VLo, a new model that unifies multimodal understanding and image generation in a single architecture. Building on the Qwen2.5 VL lineage, the model is positioned to both comprehend and generate high-quality visual content. This represents a step toward unified perception-and-creation models, a direction several frontier labs are pursuing simultaneously.

Frontier Model Releases Multimodal Progress Qwen-VL Qwen2.5-VL Alibaba Qwen +1 more

6Qwen Research·May 18, 2026·source ↗

Qwen-Image-Edit: Image Editing Model with Text Rendering and Dual Visual Control

Alibaba's Qwen team has released Qwen-Image-Edit, a 20B-parameter image editing model built on the Qwen-Image foundation. The model extends Qwen-Image's text rendering capabilities to editing tasks, enabling precise in-image text modification. It uses a dual-path architecture that simultaneously feeds input images into Qwen2.5-VL for semantic control and a VAE Encoder for appearance control, enabling both semantic and appearance-level edits.

Frontier Model Releases Multimodal Progress Qwen2.5-VL Qwen-Image-Edit Qwen-Image +2 more