Almanac
benchmark

G-IdiomAlign

benchmarkactiveprovisionalg-idiomalign-fe659981·1 events·first seen 3d ago

Aliases: G-IdiomAlign

Co-occurring entities

More like this (12)

Recent events (1)

4arXiv · cs.CL·3d ago·source ↗

G-IdiomAlign: Gloss-pivoted benchmark for cross-lingual idiom alignment in LLMs

Researchers introduce G-IdiomAlign, a benchmark anchoring idioms via English glosses from Wiktionary to evaluate cross-lingual idiom equivalence in LLMs. The benchmark supports two evaluation protocols: a multiple-choice task with typed distractors and a gloss-contrastive generation task isolating the effect of explicit semantic pivots. Experiments across diverse LLMs find that literal translation bias is the dominant failure mode, especially for low-resource languages, and that gloss conditioning improves performance but leaves substantial headroom. Mechanistic analysis on Qwen3-8B suggests cross-condition differences are concentrated in attention heads rather than layers.