Almanac
benchmark

Personal Relation Task

benchmarkactiveprovisionalpersonal-relation-task-ab632d16·1 events·first seen 16d ago

Aliases: Personal Relation Task

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·16d ago·source ↗

LLMs Show Inverted Compositional Strengths vs. Humans on Reference Resolution Task

This paper evaluates LLMs and humans on the Personal Relation Task (Paperno 2022), distinguishing between Extensional tasks (resolving what an expression refers to) and Intensional tasks (representing structured sense/formula). The study finds that humans outperform LLMs on Extensional tasks while LLMs outperform humans on Intensional tasks—an inverted pattern of strengths. The authors argue this asymmetry reflects the absence of referential grounding in LLM training as a key gap in human-like language understanding.