paper
The Tatoxa System for Text Detoxification in Low-Resource Languages: The Case of Tatar
paperactiveprovisional
the-tatoxa-system-for-text-detoxification-in-low-resource-languages-the-case-of-tatar-d6049b51·1 events·first seen 5d agoAliases: The Tatoxa System for Text Detoxification in Low-Resource Languages: The Case of Tatar
Co-occurring entities
More like this (12)
TOBA tokenizerKomi-Yazva–Russian Parallel CorpusA Komi-Yazva–Russian Parallel Corpus and Evaluation Protocol for Zero- and Few-Shot LLM TranslationData Synthesis and Parameter-Efficient Fine-Tuning for Low-Resource NMT: A Case Study on Q'eqchi' MayanE-TTSUrdu Katib Handwritten DatasetText Aphasia Battery (TAB)ToxiREX: A Dataset on Toxic REasoning in ConteXtContext-Aware Distillation and Ablation for Text2DSLMOSS-TTSThe Anatomy of the CTC Oracle Gap: Acoustic Exhaustion and Linguistic RecoveryReasoning over Grammar: Can Synthetic Linguistic Reasoning Traces Enhance Low-Resource Machine Translation?
Recent events (1)
Tatoxa: State-of-the-art text detoxification system for the low-resource Tatar language
Researchers introduce Tatoxa, a text detoxification system for the Tatar language, along with a new fine-tuning and evaluation dataset for this low-resource setting. Comparative experiments show Tatoxa outperforms both open-source and proprietary LLMs on quality metrics. Cross-lingual transfer experiments find that even culturally close Russian data transfers poorly compared to native Tatar training data, highlighting the limits of cross-lingual approaches for low-resource languages.