Almanac
dataset

mMARCO

datasetactiveprovisionalmmarco-6fd8d8ef·1 events·first seen 5d ago

Aliases: mMARCO

Co-occurring entities

More like this (12)

Recent events (1)

4arXiv · cs.CL·5d ago·source ↗

Embedding interpolation study reveals structured benefits of mixed-language queries in multilingual dense retrieval

A ratio-controlled study on mMARCO evaluates how mixing proportions of parallel query translations via embedding-level interpolation affect multilingual dense retrieval performance. Using BGE-M3, the authors find that an optimal mixing ratio outperforms the best monolingual endpoint in 88 of 105 cases, with a clear asymmetry driven by English dominance. Mixing is uniformly beneficial for non-English document indices, while English-containing indices are best served by pure English queries, and mixing gains correlate negatively with typological distance when controlling for English dominance.