Entity · benchmark

concreteness ratings

benchmarkactiveconcreteness-ratings-c9e6323a·1 events·first seen May 27, 2026

Aliases: concreteness ratings

Co-occurring entities

canonical correlation analysis Vision-Language Models probing classifiers imagery ratings

More like this (12)

concrete score function imagery ratings Conners' Teacher Rating Scale-Revised Short Form human alignment benchmarks (perceptual similarity, gloss, robustness, shape-texture)Plausibility Evaluation Two-Level Meta-Rubrics for Evaluating Open-Ended Generation: GAMUT, a Benchmark for Factual Completeness Concrete ML Rating the Pitch, Not the Product: User Evaluations of LLMs Reflect Expectations More Than Performance Manim Visual Quality Score SAE Specificity Score Concrete Problems in AI Safety Human Creativity Benchmark

Recent events (1)

5arXiv · cs.CL·May 27, 2026·source ↗

Real Images, Worse Judgments: Evaluating VLMs on Concreteness and Imagery

This paper evaluates whether vision-language models (VLMs) benefit from real image context when making lexical judgments about word concreteness and imagery. The authors find that real-image contexts frequently hurt alignment with human ratings, especially when visual evidence is least relevant to the word being judged. Probing and canonical correlation analysis reveal that real images cause representational shifts and increased sensitivity to spurious visual cues. Instructing models to focus on text-only content at inference time partially mitigates this degradation.

Evaluation and Benchmarking Multimodal Progress concreteness ratings canonical correlation analysis Vision-Language Models +2 more