Entity · benchmark

BLINK

benchmarkactiveblink-908c0ad4·1 events·first seen Jun 15, 2026

Aliases: BLINK

Co-occurring entities

MathVista When Good Verifiers Go Bad: Self-Improving VLMs Can Regress on New Tasks Direct Preference Optimization (DPO)Qwen-3-VL-2B Qwen-2.5-VL-3B MMMU

More like this (12)

BRIGHT BLOOMZ BLIP-2 RING BOLD BLOOM BLEU BIRD MultiBLiMP ALIGNBEAM MIRROR BLEURT

Recent events (1)

6arXiv · cs.AI·Jun 15, 2026·source ↗

Self-improving VLMs can silently regress when verifier quality is task-mismatched

A new arXiv paper demonstrates that verifier-driven self-DPO, a common recipe for self-improving visual-language models, can silently degrade student model performance when the verifier's task-rubric accuracy is insufficient for the target task. Experiments on Qwen-3-VL-2B and Qwen-2.5-VL-3B across MathVista, MMMU, and BLINK show regressions of 3.4–10.9 percentage points below frozen baselines, with the counterintuitive finding that more accurate-but-still-wrong verifiers cause larger regressions than near-random ones. The authors provide a mechanistic explanation via a variance theorem for progress-gated replay and offer operational guidance: measure target-task rubric accuracy before running any verifier-driven loop and rank verifiers by task-specific quality rather than parameter count.

Evaluation and Benchmarking Alignment and RLHF MathVista When Good Verifiers Go Bad: Self-Improving VLMs Can Regress on New Tasks BLINK +5 more