Almanac
paper

The Tatoxa System for Text Detoxification in Low-Resource Languages: The Case of Tatar

paperactiveprovisionalthe-tatoxa-system-for-text-detoxification-in-low-resource-languages-the-case-of-tatar-d6049b51·1 events·first seen 5d ago

Aliases: The Tatoxa System for Text Detoxification in Low-Resource Languages: The Case of Tatar

Co-occurring entities

More like this (12)

Recent events (1)

3arXiv · cs.CL·5d ago·source ↗

Tatoxa: State-of-the-art text detoxification system for the low-resource Tatar language

Researchers introduce Tatoxa, a text detoxification system for the Tatar language, along with a new fine-tuning and evaluation dataset for this low-resource setting. Comparative experiments show Tatoxa outperforms both open-source and proprietary LLMs on quality metrics. Cross-lingual transfer experiments find that even culturally close Russian data transfers poorly compared to native Tatar training data, highlighting the limits of cross-lingual approaches for low-resource languages.