Entity · dataset

mJudge

datasetactivemjudge-eea479f4·1 events·first seen May 28, 2026

Aliases: mJudge

Co-occurring entities

Basque language LLM-as-a-Judge HiTZ Center (hitz-zentroa)Spanish language

More like this (12)

BabelJudge AGC-Judge LLM-as-a-Judge LLM-judge scoring Judge Arena LawBench SAM Audio Judge Judge-Pluralis JAM (Judge for Adaptive Metric-Alignment)MLE Bench Lite TuneJury JobBench

Recent events (1)

5arXiv · cs.CL·May 28, 2026·source ↗

Towards Reliable Multilingual LLMs-as-a-Judge: An Empirical Study

This paper systematically investigates strategies for extending LLM-based automatic evaluation (LLMs-as-a-Judge) to multilingual settings, covering high-, mid-, and low-resource languages (English, Spanish, Basque). The authors compare instruction translation, monolingual vs. multilingual supervision, and model size, finding that fine-tuned smaller models can match proprietary models when in-domain data is available, while zero-shot larger models are preferable out-of-domain. Two meta-evaluation datasets are extended to Spanish and Basque, and all data and code are publicly released.

Evaluation and Benchmarking Basque language LLM-as-a-Judge mJudge +2 more