paper
From Self to Other: Evaluating Demographic Perspective-Taking in LLM Hate Speech Annotation
paperactiveprovisional
from-self-to-other-evaluating-demographic-perspective-taking-in-llm-hate-speech-annotation-8e20d153·1 events·first seen 11d agoAliases: From Self to Other: Evaluating Demographic Perspective-Taking in LLM Hate Speech Annotation
Co-occurring entities
More like this (12)
human-LLM collaborative annotationMeasuring Human Value Expression in Social Media Texts: Calibrated LLM Annotation and Encoder TransferUC Berkeley Measuring Hate Speech CorpusThe Masked Advantage: Uncovering Local-Language Access to Cultural Knowledge in LLMsRevising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online DiscussionsBeyond Third-Person Audits: Situated Interaction Auditing for User-Centered LLM Bias ResearchLLM-based content classificationhate-based rhetoricA Resource for Enthymeme Detection in Controversial Political DiscourseSpeechLLMLLM-based code change labeling pipelineThe Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
Recent events (1)
LLMs fail to consistently simulate demographic perspective-taking in hate speech annotation
A new arXiv paper evaluates whether persona-conditioned LLMs can replicate how different demographic groups perceive hate speech, testing three dimensions: inter-group disagreement, in-group sensitivity, and vicarious prediction. No model consistently captures all three dimensions, and performance is highly model-dependent rather than emerging reliably from identity prompts alone. Vicarious prompting with Llama 3.1 provides the closest approximation to human disagreement patterns across demographic axes. The findings have implications for using LLMs as proxies for diverse human annotators in content moderation tasks.