Almanac
dataset

Wiktionary

datasetactiveprovisionalwiktionary-b9fd6a85·1 events·first seen 2d ago

Aliases: Wiktionary

Co-occurring entities

More like this (12)

Recent events (1)

4arXiv · cs.CL·2d ago·source ↗

G-IdiomAlign: Gloss-pivoted benchmark for cross-lingual idiom alignment in LLMs

Researchers introduce G-IdiomAlign, a benchmark anchoring idioms via English glosses from Wiktionary to evaluate cross-lingual idiom equivalence in LLMs. The benchmark supports two evaluation protocols: a multiple-choice task with typed distractors and a gloss-contrastive generation task isolating the effect of explicit semantic pivots. Experiments across diverse LLMs find that literal translation bias is the dominant failure mode, especially for low-resource languages, and that gloss conditioning improves performance but leaves substantial headroom. Mechanistic analysis on Qwen3-8B suggests cross-condition differences are concentrated in attention heads rather than layers.