Komi-Yazva–Russian Parallel Corpus
komi-yazva-russian-parallel-corpus-ae90e3f0·1 events·first seen 11d agoAliases: Komi-Yazva–Russian Parallel Corpus
Co-occurring entities
More like this (12)
Recent events (1)
First Komi-Yazva–Russian parallel corpus and LLM translation evaluation protocol for endangered low-resource language
Researchers introduce the first Komi-Yazva–Russian parallel corpus of 457 aligned sentence pairs from 74 narrative texts, paired with a rigorous evaluation protocol for studying LLM translation under extreme data scarcity. The protocol includes story-level cross-validation, deterministic retrieval-based few-shot prompting, and both reference-based and judge-based metrics to ensure leakage-aware, reproducible evaluation. Results show LLMs produce non-trivial translations but performance varies strongly by model family; retrieval-based few-shot prompting consistently outperforms zero-shot, though gains plateau quickly. The work frames the corpus as both a dataset contribution and a reproducible testbed for endangered-language machine translation research.