Entity · dataset

Komi-Yazva–Russian Parallel Corpus

datasetactivekomi-yazva-russian-parallel-corpus-ae90e3f0·1 events·first seen Jun 5, 2026

Aliases: Komi-Yazva–Russian Parallel Corpus

Co-occurring entities

A Komi-Yazva–Russian Parallel Corpus and Evaluation Protocol for Zero- and Few-Shot LLM Translation

More like this (12)

A Komi-Yazva–Russian Parallel Corpus and Evaluation Protocol for Zero- and Few-Shot LLM Translation AG-MG Parallel Corpus Reeve Foundation Multilingual Corpus SearchGen-Corpus-1M VUA Metaphor Corpus The Tatoxa System for Text Detoxification in Low-Resource Languages: The Case of Tatar International Corpus of English Filtered-Corpus Training UC Berkeley Measuring Hate Speech Corpus A Human-in-the-Loop Corpus for LLM-Based Simplification of Scientific Summaries Urdu Katib Handwritten Dataset Echoes Across Vietnam's Highlands, Delta, and Coast: A Multilingual Corpus for Cham, Khmer, and Tay-Nung

Recent events (1)

3arXiv · cs.CL·Jun 5, 2026·source ↗

First Komi-Yazva–Russian parallel corpus and LLM translation evaluation protocol for endangered low-resource language

Researchers introduce the first Komi-Yazva–Russian parallel corpus of 457 aligned sentence pairs from 74 narrative texts, paired with a rigorous evaluation protocol for studying LLM translation under extreme data scarcity. The protocol includes story-level cross-validation, deterministic retrieval-based few-shot prompting, and both reference-based and judge-based metrics to ensure leakage-aware, reproducible evaluation. Results show LLMs produce non-trivial translations but performance varies strongly by model family; retrieval-based few-shot prompting consistently outperforms zero-shot, though gains plateau quickly. The work frames the corpus as both a dataset contribution and a reproducible testbed for endangered-language machine translation research.

Evaluation and Benchmarking A Komi-Yazva–Russian Parallel Corpus and Evaluation Protocol for Zero- and Few-Shot LLM Translation Komi-Yazva–Russian Parallel Corpus