ToxiREX
toxirex-d07e7def·1 events·first seen 37h agoAliases: ToxiREX
Co-occurring entities
More like this (12)
Recent events (1)
ToxiREX: Multilingual contextual dataset for implicit toxicity detection with structured reasoning schema
Researchers introduce ToxiREX, a multilingual Reddit-based dataset for detecting implicit and context-dependent toxicity across six languages (English, Arabic, Turkish, Spanish, German, Dutch), anchored to real-world events like the 2023 Turkey earthquakes and the Russian invasion of Ukraine. The dataset includes 125K LLM-annotated training comments and ~3K human-annotated test comments, structured using a toxic reasoning schema that captures implicit toxicity and maps to existing taxonomies. Baseline results from prompted and fine-tuned language models show above-random but substantially suboptimal performance, indicating the task remains challenging. ToxiREX is claimed as the first dataset combining multilingual coverage, conversational context, and implicit toxicity with schema-based structured annotations.