Entity · dataset

AG-MG Parallel Corpus

datasetactiveag-mg-parallel-corpus-8bb7f3b0·1 events·first seen May 19, 2026

Aliases: AG-MG Parallel Corpus

Co-occurring entities

M2M100 VecAlign NLLB Gemini-2.5-Flash-Lite Llama-Krikri-8B QLoRA LaBSE

More like this (12)

SearchGen-Corpus-1M VUA Metaphor Corpus ARC-AGI Komi-Yazva–Russian Parallel Corpus DualG-MRAG MyAG DualG-MRAG: Decoupling Macro-Reasoning and Micro-Matching for Multimodal Retrieval-Augmented Generation GGML Reeve Foundation Multilingual Corpus AG-UI Protocol AG-UI Direct Corpus Interaction

Recent events (1)

4arXiv · cs.CL·May 19, 2026·source ↗

Ancient Greek to Modern Greek Machine Translation: Novel Benchmark and Fine-Tuning Experiments

Researchers introduce the AG-MG Parallel Corpus, a 132,481 sentence-pair dataset for Ancient Greek to Modern Greek machine translation, created via a pipeline combining web scraping, VecAlign with LaBSE embeddings, and Gemini 2.5 Flash-based alignment correction. The paper benchmarks NMT models (NLLB, M2M100) and a Greek LLM (Llama-Krikri-8B) under three fine-tuning strategies. Full-parameter fine-tuning of Llama-Krikri-8B achieves the best BLEU score of 13.16, while QLoRA-adapted M2M100-1.2B shows the largest relative gains (+10.3 BLEU). This represents the first comprehensive MT benchmark for this low-resource language pair.

Evaluation and Benchmarking Open Weights Progress M2M100 VecAlign NLLB +5 more