Entity · dataset

Perceptually Perturbed Judgment Dataset

datasetactiveperceptually-perturbed-judgment-dataset-d9fae811·1 events·first seen Jun 2, 2026

Aliases: Perceptually Perturbed Judgment Dataset

Co-occurring entities

Multimodal Large Language Models GRPO LLM-as-a-Judge Perceptual Judgment Bias

More like this (12)

Perceptual Judgment Bias Percepta JAT Dataset The Many Senses of Visual Similarity: A Text-Prompted Image Perceptual Metric Beyond a Single Judge: Simulating Social Persona Panels for Generative UI Evaluation Judgement-of-Learning Paraphrase Judgment Open Preference Dataset for Text-to-Image Generation or Judgement? A Paradigm Perspective on LLM-Based Emotion-Cause Pair Extraction in Conversation Visual Attribution Distillation An Evaluation Framework for Structured Audio Captions Validated by Controlled Perturbations How Robust is OCR-Reasoning? Evaluating OCR-Reasoning Robustness of Vision-Language Models under Visual Perturbations

Recent events (1)

6arXiv · cs.AI·Jun 2, 2026·source ↗

Mitigating Perceptual Judgment Bias in Multimodal LLM-as-a-Judge via Perceptual Perturbation and Reward Modeling

This paper identifies and analyzes 'Perceptual Judgment Bias' in multimodal LLM judges, where models anchor on response text rather than visual evidence when the two conflict. The authors introduce a Perceptually Perturbed Judgment Dataset using counterfactual responses to isolate perceptual errors, and a training framework combining GRPO-based reward modeling with batch-ranking objectives. Experiments on MLLM-as-a-Judge benchmarks show improved perceptual fidelity, ranking coherence, and alignment with human evaluation.

Evaluation and Benchmarking Alignment and RLHF Perceptually Perturbed Judgment Dataset Multimodal Large Language Models GRPO +3 more