Entity · model

Qwen-3-VL-2B

modelactiveqwen-3-vl-2b-247125c7·2 events·first seen Jun 15, 2026

Aliases: Qwen-3-VL-2B, QwenVL3-2B

Co-occurring entities

PGPS9K Geometry3K SD-GPS MathVista When Good Verifiers Go Bad: Self-Improving VLMs Can Regress on New Tasks BLINK Direct Preference Optimization (DPO)Qwen-2.5-VL-3B MMMU

More like this (12)

Qwen-2.5-VL-3B Qwen3VL-8B Qwen3VL-8B Qwen2.5-VL Qwen2.5-VL-72B Qwen-VL Qwen3-30B Qwen3-14B Qwen2.5-3B Qwen3-4B Qwen3-VL-Thinking Qwen3.5-35B-A3B

Recent events (2)

4arXiv · cs.CL·Jun 29, 2026·source ↗

SD-GPS: Solver-Driven Autoformalization and Theorem Proposing for Geometry Problem Solving

Researchers propose SD-GPS, a neuro-symbolic framework for geometry problem solving that treats a symbolic solver as an execution oracle during both formalization and deduction stages. The system combines solvability-guided reinforcement learning for autoformalization (built on QwenVL3-2B) with an impasse-aware agent that proposes and symbolically verifies auxiliary lemmas. Evaluations on Geometry3K and PGPS9K show SD-GPS outperforms existing multimodal, neural, and neuro-symbolic baselines across multiple task regimes. The work advances the line of research on grounding neural agents in formal systems for verifiable mathematical reasoning.

Evaluation and Benchmarking Multimodal Progress PGPS9K Geometry3K Qwen-3-VL-2B +1 more

6arXiv · cs.AI·Jun 15, 2026·source ↗

Self-improving VLMs can silently regress when verifier quality is task-mismatched

A new arXiv paper demonstrates that verifier-driven self-DPO, a common recipe for self-improving visual-language models, can silently degrade student model performance when the verifier's task-rubric accuracy is insufficient for the target task. Experiments on Qwen-3-VL-2B and Qwen-2.5-VL-3B across MathVista, MMMU, and BLINK show regressions of 3.4–10.9 percentage points below frozen baselines, with the counterintuitive finding that more accurate-but-still-wrong verifiers cause larger regressions than near-random ones. The authors provide a mechanistic explanation via a variance theorem for progress-gated replay and offer operational guidance: measure target-task rubric accuracy before running any verifier-driven loop and rank verifiers by task-specific quality rather than parameter count.

Evaluation and Benchmarking Alignment and RLHF MathVista When Good Verifiers Go Bad: Self-Improving VLMs Can Regress on New Tasks BLINK +5 more