Almanac
product

Tatoxa

productactiveprovisionaltatoxa-6ff3407f·1 events·first seen 5d ago

Aliases: Tatoxa

Co-occurring entities

More like this (12)

Recent events (1)

3arXiv · cs.CL·5d ago·source ↗

Tatoxa: State-of-the-art text detoxification system for the low-resource Tatar language

Researchers introduce Tatoxa, a text detoxification system for the Tatar language, along with a new fine-tuning and evaluation dataset for this low-resource setting. Comparative experiments show Tatoxa outperforms both open-source and proprietary LLMs on quality metrics. Cross-lingual transfer experiments find that even culturally close Russian data transfers poorly compared to native Tatar training data, highlighting the limits of cross-lingual approaches for low-resource languages.