Almanac
model

Multilingual E5

modelactiveprovisionalmultilingual-e5-bfdb77fb·1 events·first seen 5d ago

Aliases: Multilingual E5

Co-occurring entities

More like this (12)

Recent events (1)

3arXiv · cs.LG·5d ago·source ↗

SkMTEB: First comprehensive MTEB-style text embedding benchmark for Slovak with adapted E5 models

Researchers introduce SkMTEB, the first MTEB-style embedding benchmark for Slovak, covering 31 datasets across 7 task types — roughly 4× the existing multilingual benchmark coverage for the language. Evaluation of 31 embedding models shows large instruction-tuned multilingual models outperform Slovak-specific NLU models on embedding tasks. The authors also release e5-sk-small (45M) and e5-sk-large (365M), derived from Multilingual E5 via vocabulary trimming and fine-tuning, achieving competitive performance with proprietary APIs at up to 62% size reduction.