Almanac
dataset

Komi-Yazva–Russian Parallel Corpus

datasetactiveprovisionalkomi-yazva-russian-parallel-corpus-ae90e3f0·1 events·first seen 11d ago

Aliases: Komi-Yazva–Russian Parallel Corpus

Co-occurring entities

More like this (12)

Recent events (1)

3arXiv · cs.CL·11d ago·source ↗

First Komi-Yazva–Russian parallel corpus and LLM translation evaluation protocol for endangered low-resource language

Researchers introduce the first Komi-Yazva–Russian parallel corpus of 457 aligned sentence pairs from 74 narrative texts, paired with a rigorous evaluation protocol for studying LLM translation under extreme data scarcity. The protocol includes story-level cross-validation, deterministic retrieval-based few-shot prompting, and both reference-based and judge-based metrics to ensure leakage-aware, reproducible evaluation. Results show LLMs produce non-trivial translations but performance varies strongly by model family; retrieval-based few-shot prompting consistently outperforms zero-shot, though gains plateau quickly. The work frames the corpus as both a dataset contribution and a reproducible testbed for endangered-language machine translation research.