benchmark
Personal Relation Task
benchmarkactiveprovisional
personal-relation-task-ab632d16·1 events·first seen 16d agoAliases: Personal Relation Task
Co-occurring entities
More like this (12)
Recent events (1)
LLMs Show Inverted Compositional Strengths vs. Humans on Reference Resolution Task
This paper evaluates LLMs and humans on the Personal Relation Task (Paperno 2022), distinguishing between Extensional tasks (resolving what an expression refers to) and Intensional tasks (representing structured sense/formula). The study finds that humans outperform LLMs on Extensional tasks while LLMs outperform humans on Intensional tasks—an inverted pattern of strengths. The authors argue this asymmetry reflects the absence of referential grounding in LLM training as a key gap in human-like language understanding.