Entity · benchmark

Personal Relation Task

benchmarkactivepersonal-relation-task-ab632d16·1 events·first seen Jun 1, 2026

Aliases: Personal Relation Task

Co-occurring entities

large language models referential grounding compositional generalization Paperno 2022

More like this (12)

agent-task efficiency Skill-RM Persona-Pruner Emotional Need-aware Proactive Memory Retrieval Relational Deep Learning Partner Capability Estimation for Task-Agnostic Adaptation in Ad-Hoc Teamwork ExpRL ContextRL Functional Attention Adaptive Parallel Reasoning Courteous Anticipation: Improving Long-Lived Task Planning in Persistent Shared Environments Agentic RL

Recent events (1)

5arXiv · cs.CL·Jun 1, 2026·source ↗

LLMs Show Inverted Compositional Strengths vs. Humans on Reference Resolution Task

This paper evaluates LLMs and humans on the Personal Relation Task (Paperno 2022), distinguishing between Extensional tasks (resolving what an expression refers to) and Intensional tasks (representing structured sense/formula). The study finds that humans outperform LLMs on Extensional tasks while LLMs outperform humans on Intensional tasks—an inverted pattern of strengths. The authors argue this asymmetry reflects the absence of referential grounding in LLM training as a key gap in human-like language understanding.

Evaluation and Benchmarking Alignment and RLHF large language models referential grounding compositional generalization +2 more