paper

How Robust is OCR-Reasoning? Evaluating OCR-Reasoning Robustness of Vision-Language Models under Visual Perturbations

paperactiveprovisional

how-robust-is-ocr-reasoning-evaluating-ocr-reasoning-robustness-of-vision-language-models-under-visual-perturbations-4f1160e0

·1 events·first seen 7h ago

Aliases: How Robust is OCR-Reasoning? Evaluating OCR-Reasoning Robustness of Vision-Language Models under Visual Perturbations

Co-occurring entities

OCR-Robust

More like this (12)

OCR-Robust Uncertainty Quantification for Computer-Use Agents: A Benchmark across Vision-Language Models and GUI Grounding Datasets Exploring Adversarial Robustness and Safety Alignment in Multilingual Multi-Modal Large Language Models Reasoning Language Models Does Reasoning Preserve Alignment? On the Trustworthiness of Large Reasoning Models LabVLA: Grounding Vision-Language-Action Models in Scientific Laboratories Training-Free Semantic Correction for Autoregressive Visual Models Reroute, Don't Remove: Recoverable Visual Token Routing for Vision-Language Models Does VLA Even Know the Basics? Measuring Commonsense and World Knowledge Retention in Vision-Language-Action Models Modeling Complex Behaviors: Multi-Personality Composition and Dynamic Switching in Vision-Language Models Same Evidence, Different Answer: Auditing Order Sensitivity in Multimodal Large Language Models adversarial robustness

Recent events (1)

4arXiv · cs.CL·7h ago·source ↗

OCR-Robust benchmark evaluates VLM robustness to visual perturbations on OCR-reasoning tasks

Researchers introduce OCR-Robust, a benchmark of 812 samples designed to evaluate how vision-language models handle OCR-reasoning tasks under controlled visual degradation. The benchmark covers documents, scene text, charts, geometry, and tables, applying 5 perturbation types at 3 severity levels each, and evaluates 18 models using metrics including Relative Corruption Retention and a composite Corruption Robustness Index. Key findings show that higher clean accuracy does not guarantee robustness, and that chart and table inputs are substantially more fragile under perturbation than document-like inputs.

Evaluation and Benchmarking Multimodal Progress How Robust is OCR-Reasoning? Evaluating OCR-Reasoning Robustness of Vision-Language Models under Visual Perturbations OCR-Robust