Entity · paper

From Self to Other: Evaluating Demographic Perspective-Taking in LLM Hate Speech Annotation

paperactivefrom-self-to-other-evaluating-demographic-perspective-taking-in-llm-hate-speech-annotation-8e20d153·1 events·first seen Jun 5, 2026

Aliases: From Self to Other: Evaluating Demographic Perspective-Taking in LLM Hate Speech Annotation

Co-occurring entities

Meta Llama-3.1-8B

More like this (12)

human-LLM collaborative annotation Innocuous-Seeming Data, Latent Ideology: Ideological Generalisation in Finetuned LLMs Measuring Human Value Expression in Social Media Texts: Calibrated LLM Annotation and Encoder Transfer Validity of LLMs as data annotators: AMALIA on authority UC Berkeley Measuring Hate Speech Corpus The Masked Advantage: Uncovering Local-Language Access to Cultural Knowledge in LLMs Revising Context, Shifting Simulated Stance: Auditing LLM-Based Stance Simulation in Online Discussions Beyond Benchmarks: Exposing the Hidden Crisis in Bangla Hate Speech Detection Beyond Third-Person Audits: Situated Interaction Auditing for User-Centered LLM Bias Research Generation or Judgement? A Paradigm Perspective on LLM-Based Emotion-Cause Pair Extraction in Conversation Linguistic Monoculture in LLM-Assisted Language Use From Plausible to Actionable: A Position on LLM Self-Explanations

Recent events (1)

5arXiv · cs.CL·Jun 5, 2026·source ↗

LLMs fail to consistently simulate demographic perspective-taking in hate speech annotation

A new arXiv paper evaluates whether persona-conditioned LLMs can replicate how different demographic groups perceive hate speech, testing three dimensions: inter-group disagreement, in-group sensitivity, and vicarious prediction. No model consistently captures all three dimensions, and performance is highly model-dependent rather than emerging reliably from identity prompts alone. Vicarious prompting with Llama 3.1 provides the closest approximation to human disagreement patterns across demographic axes. The findings have implications for using LLMs as proxies for diverse human annotators in content moderation tasks.

Evaluation and Benchmarking AI Safety Research From Self to Other: Evaluating Demographic Perspective-Taking in LLM Hate Speech Annotation Meta Llama-3.1-8B