benchmark
SkMTEB
benchmarkactiveprovisional
skmteb-92b1804b·1 events·first seen 5d agoAliases: SkMTEB
Co-occurring entities
More like this (12)
Recent events (1)
SkMTEB: First comprehensive MTEB-style text embedding benchmark for Slovak with adapted E5 models
Researchers introduce SkMTEB, the first MTEB-style embedding benchmark for Slovak, covering 31 datasets across 7 task types — roughly 4× the existing multilingual benchmark coverage for the language. Evaluation of 31 embedding models shows large instruction-tuned multilingual models outperform Slovak-specific NLU models on embedding tasks. The authors also release e5-sk-small (45M) and e5-sk-large (365M), derived from Multilingual E5 via vocabulary trimming and fine-tuning, achieving competitive performance with proprietary APIs at up to 62% size reduction.