Almanac
benchmark

BLEU

benchmarkactiveprovisionalbleu-11255080·1 events·first seen 8d ago

Aliases: BLEU

Co-occurring entities

More like this (12)

Recent events (1)

3arXiv · cs.CL·8d ago·source ↗

Synthetic data bootstrapping and LoRA fine-tuning for Q'eqchi' Mayan NMT without web scraping

Researchers introduce a data synthesis methodology for low-resource neural machine translation of Q'eqchi' Mayan, converting community-sourced dictionaries into a synthetic parallel corpus to avoid scraping target-language data. Using LoRA adapters on mT5-base, the approach achieves BLEU 42.02 on in-domain evaluation but only 0.59 against organic text, revealing a structural-semantic gap. An ablation with multi-task learning produced negative transfer, suggesting LoRA capacity limits conflict with auxiliary objectives. The study concludes synthetic bootstrapping is effective for structural priming but requires authentic data for semantic refinement via curriculum learning.