Almanac
paper

ToxiREX: A Dataset on Toxic REasoning in ConteXt

paperactiveprovisionaltoxirex-a-dataset-on-toxic-reasoning-in-context-4e8df177·1 events·first seen 38h ago

Aliases: ToxiREX: A Dataset on Toxic REasoning in ConteXt

Co-occurring entities

More like this (12)

Recent events (1)

5arXiv · cs.CL·38h ago·source ↗

ToxiREX: Multilingual contextual dataset for implicit toxicity detection with structured reasoning schema

Researchers introduce ToxiREX, a multilingual Reddit-based dataset for detecting implicit and context-dependent toxicity across six languages (English, Arabic, Turkish, Spanish, German, Dutch), anchored to real-world events like the 2023 Turkey earthquakes and the Russian invasion of Ukraine. The dataset includes 125K LLM-annotated training comments and ~3K human-annotated test comments, structured using a toxic reasoning schema that captures implicit toxicity and maps to existing taxonomies. Baseline results from prompted and fine-tuned language models show above-random but substantially suboptimal performance, indicating the task remains challenging. ToxiREX is claimed as the first dataset combining multilingual coverage, conversational context, and implicit toxicity with schema-based structured annotations.