Entity · dataset

WhoSaidIt

datasetactivewhosaidit-12d5ed80·1 events·first seen May 26, 2026

Aliases: WhoSaidIt

Co-occurring entities

human-LLM collaborative annotation speaker-attribute classification disagreement-focused sampling

More like this (12)

DeiT SawyerHood SiT SHuBERT-ByT5 StoryTeller FacialTalker SentencePiece DoWhy teLLMe HellaSwag iSarcasm Scott Wiener

Recent events (1)

4arXiv · cs.CL·May 26, 2026·source ↗

WhoSaidIt: Human-LLM Collaborative Annotation for Multilingual Speaker-Attribute Classification

This paper proposes a human-LLM collaborative re-annotation framework for stabilizing noisy multilingual speaker-attribute labels under resource constraints. LLMs surface recurring annotation rationales through iterative expert interaction, combined with disagreement-focused sampling for targeted re-annotation. The resulting WhoSaidIt dataset covers nine speaker-attribute labels across multiple languages. Benchmarking of recent LLMs reveals substantial cross-lingual annotation divergence and highlights both capabilities and limitations of LLMs in this classification task.

Evaluation and Benchmarking Agent and Tool Ecosystem human-LLM collaborative annotation speaker-attribute classification WhoSaidIt +1 more