Entity · technique

hate speech detection

techniqueactivehate-speech-detection-06fe9ce6·3 events·first seen May 21, 2026

Aliases: hate speech detection

Co-occurring entities

Token-level Rationales Faithfulness Evaluation Plausibility Evaluation Soft Label Supervision MHS POPQUORN gated demographic residual model annotator disagreement Emotion Recognition Text Analytics Evaluation Framework X (Twitter)Sentiment Analysis

More like this (12)

stance detection Beyond Benchmarks: Exposing the Hidden Crisis in Bangla Hate Speech Detection UC Berkeley Measuring Hate Speech Corpus hate-based rhetoric Voice Activity Detection (VAD)Sentiment Analysis SH-Detection Safety Detection Classifier misalignment detection Speech-to-Speech What Do Deepfake Speech Detectors Actually Hear?From Self to Other: Evaluating Demographic Perspective-Taking in LLM Hate Speech Annotation

Recent events (3)

5arXiv · cs.CL·Jun 1, 2026·source ↗

Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection

This paper investigates human disagreement in token-level rationale annotations for hate speech detection, a dimension less studied than label disagreement. The authors unify diverse models, training strategies, loss functions, and evaluation metrics under a single protocol, systematically comparing hard and soft label/rationale representation spaces. Results show that both hard and soft metrics favor softer representations, suggesting that soft supervision better captures human reasoning variation in subjective NLP tasks. The work calls for rethinking evaluation frameworks for classification and explainability in subjective NLP.

Evaluation and Benchmarking Alignment and RLHF Token-level Rationales Faithfulness Evaluation Plausibility Evaluation +2 more

5arXiv · cs.CL·May 27, 2026·source ↗

When Does Demographic Information Help? Data and Modeling Regimes for Perspective-Aware Hate Speech Detection

This paper investigates when demographic features improve hate speech detection models that account for annotator perspectives, finding that gains are not universal but depend on specific data and modeling conditions. The authors identify that demographic information helps most in regimes with low training disagreement, high test disagreement, sufficient training data, and strong demographic overlap between train and test sets. They introduce a gated demographic residual model that selectively applies demographic adjustments to text-only predictions, demonstrating effectiveness on high-disagreement and low-confidence examples using the MHS and POPQUORN datasets. The work cautions against assuming demographic features are universally beneficial in subjective NLP tasks.

Evaluation and Benchmarking Alignment and RLHF MHS POPQUORN gated demographic residual model +2 more

5arXiv · cs.CL·May 21, 2026·source ↗

Text Analytics Evaluation Framework: Benchmarking LLMs on Social Media NLP Tasks

Researchers introduce a 470-question evaluation framework to assess LLM performance on aggregated social media text, applied to Twitter datasets across sentiment analysis, hate speech detection, and emotion recognition. Results show performance degrades substantially as input scale exceeds 500 instances, particularly for open-weights models on numerical tasks. Multi-label and target-dependent scenarios also show notable performance drops, and task complexity progressively erodes accuracy from basic semantic identification to comparison and counting operations. The findings point to architectural bottlenecks in current LLMs for rigorous quantitative analysis over large text collections.

Long Context Evolution Evaluation and Benchmarking Emotion Recognition Text Analytics Evaluation Framework X (Twitter)+3 more