benchmark
concreteness ratings
benchmarkactiveprovisional
concreteness-ratings-c9e6323a·1 events·first seen 21d agoAliases: concreteness ratings
Co-occurring entities
More like this (12)
concrete score functionimagery ratingsConners' Teacher Rating Scale-Revised Short Formhuman alignment benchmarks (perceptual similarity, gloss, robustness, shape-texture)Plausibility EvaluationConcrete MLSAE Specificity ScoreConcrete Problems in AI SafetyEvaluation Cards: An Interpretive Layer for AI Evaluation ReportingCreative Quality Alignment (CQA)Compositional Reasoning Depth Predicts Clinical AI FailureC-index
Recent events (1)
Real Images, Worse Judgments: Evaluating VLMs on Concreteness and Imagery
This paper evaluates whether vision-language models (VLMs) benefit from real image context when making lexical judgments about word concreteness and imagery. The authors find that real-image contexts frequently hurt alignment with human ratings, especially when visual evidence is least relevant to the word being judged. Probing and canonical correlation analysis reveal that real images cause representational shifts and increased sensitivity to spurious visual cues. Instructing models to focus on text-only content at inference time partially mitigates this degradation.