Almanac
dataset

AG-MG Parallel Corpus

datasetactiveag-mg-parallel-corpus-8bb7f3b0·1 events·first seen 28d ago

Aliases: AG-MG Parallel Corpus

Co-occurring entities

More like this (12)

Recent events (1)

4arXiv · cs.CL·28d ago·source ↗

Ancient Greek to Modern Greek Machine Translation: Novel Benchmark and Fine-Tuning Experiments

Researchers introduce the AG-MG Parallel Corpus, a 132,481 sentence-pair dataset for Ancient Greek to Modern Greek machine translation, created via a pipeline combining web scraping, VecAlign with LaBSE embeddings, and Gemini 2.5 Flash-based alignment correction. The paper benchmarks NMT models (NLLB, M2M100) and a Greek LLM (Llama-Krikri-8B) under three fine-tuning strategies. Full-parameter fine-tuning of Llama-Krikri-8B achieves the best BLEU score of 13.16, while QLoRA-adapted M2M100-1.2B shows the largest relative gains (+10.3 BLEU). This represents the first comprehensive MT benchmark for this low-resource language pair.