technique
Self-Ensembling VLM Chart Extraction
techniqueactiveprovisional
self-ensembling-vlm-chart-extraction-0e49e8bb·1 events·first seen 21d agoAliases: Self-Ensembling VLM Chart Extraction
Co-occurring entities
More like this (12)
FakeVLMGaze Heads: How VLMs Look at What They DescribeGaze Heads: How VLMs Look at What They Describegender bias in VLMsWhen Good Verifiers Go Bad: Self-Improving VLMs Can Regress on New TasksWB-ChartExtractSmolVLMWhere Does the Answer Come From? Benchmarking View-Level Visual Evidence Identification in Multi-View MLLMs for Autonomous DrivingScaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight VerifierSelf-Supervised LearningUnigramLMlambda-scaled structural decoding
Recent events (1)
Self-Ensembling Vision-Language Models for Chart Data Extraction
This paper proposes a self-ensembling method for chart-to-table extraction using vision-language models (VLMs), where multiple tabular outputs are sampled from the same VLM for a given chart image and aggregated via per-cell median over numerical values. The approach includes convergence detection and uncertainty estimation based on sample dispersion. The authors also introduce WB-ChartExtract, a new benchmark built from World Bank data featuring charts with ~7x more datapoints than ChartQA. The method achieves up to 23% relative improvement on WB-ChartExtract over single-pass VLM baselines.