Entity · benchmark

MTEB

benchmarkactivemteb-bf5dc1c6·5 events·first seen May 18, 2026

Aliases: MTEB

Co-occurring entities

Hugging Face SkMTEB e5_large e5-sk-small Multilingual E5 embedding model leaderboard prompt sensitivity instruction embedding models RTEB Mistral AI MT-Bench Mistral-embed Mistral 7B Instruct v0.2 Direct Preference Optimization (DPO)Mistral-medium NVIDIA La Plateforme Mixtral 8x7B TensorRT-LLM

More like this (12)

MMTEB SkMTEB RTEB MT-Bench MTIA 450 Abeba Birhane MTIA 300 Machine Translation (MT)MATH MTIA 500 MED ETTm2

Recent events (5)

3arXiv · cs.LG·Jun 12, 2026·source ↗

SkMTEB: First comprehensive MTEB-style text embedding benchmark for Slovak with adapted E5 models

Researchers introduce SkMTEB, the first MTEB-style embedding benchmark for Slovak, covering 31 datasets across 7 task types — roughly 4× the existing multilingual benchmark coverage for the language. Evaluation of 31 embedding models shows large instruction-tuned multilingual models outperform Slovak-specific NLU models on embedding tasks. The authors also release e5-sk-small (45M) and e5-sk-large (365M), derived from Multilingual E5 via vocabulary trimming and fine-tuning, achieving competitive performance with proprietary APIs at up to 62% size reduction.

Evaluation and Benchmarking Open Weights Progress MTEB SkMTEB e5_large +2 more

6arXiv · cs.CL·May 22, 2026·source ↗

Instruction Sensitivity Undermines Embedding Model Evaluation: Single-Prompt Benchmarks Are Insufficient

This paper presents an empirical study of prompt sensitivity in instruction-tuned embedding models, covering 6 models, 11 datasets, and 15 task-specific prompts per dataset (990 total evaluations). The authors demonstrate that single-prompt evaluation systematically misrepresents true model performance, with default prompts both understating and overstating capabilities depending on phrasing. A key finding is that leaderboard rankings are not robust: by selecting prompts favorably, any model in the study can be promoted to first place. The authors recommend that benchmarks incorporate prompt robustness metrics, either through multi-prompt evaluation or by reporting sensitivity alongside point estimates.

Evaluation and Benchmarking Agent and Tool Ecosystem MTEB embedding model leaderboard prompt sensitivity +1 more

6Hugging Face Blog·May 19, 2026·source ↗

MTEB: Massive Text Embedding Benchmark

MTEB (Massive Text Embedding Benchmark) is introduced as a large-scale benchmark for evaluating text embedding models across a wide variety of tasks and datasets. The benchmark covers multiple embedding task types including classification, clustering, retrieval, and semantic similarity, enabling systematic comparison of embedding models. It provides a public leaderboard to track progress in the text embedding space. The work addresses the lack of a unified, comprehensive evaluation framework for text embeddings.

Evaluation and Benchmarking MTEB Hugging Face

5Hugging Face Blog·May 19, 2026·source ↗

Introducing RTEB: A New Standard for Retrieval Evaluation

Hugging Face introduces RTEB (Retrieval Text Embedding Benchmark), a new benchmark designed to standardize evaluation of retrieval systems and text embeddings. The benchmark aims to address gaps in existing evaluation frameworks by providing more comprehensive and realistic retrieval tasks. This represents an effort to improve how the community measures progress in retrieval-augmented generation and semantic search systems.

Evaluation and Benchmarking Agent and Tool Ecosystem MTEB RTEB Hugging Face

7Mistral Ai News·May 18, 2026·source ↗

Mistral AI Launches La Plateforme: First API Endpoints in Early Access

Mistral AI opened beta access to its first developer platform, La Plateforme, offering three generative text endpoints (mistral-tiny, mistral-small, mistral-medium) and an embedding endpoint. Mistral-tiny serves Mistral 7B Instruct v0.2, mistral-small serves Mixtral 8x7B, and mistral-medium serves an unreleased prototype model scoring 8.6 on MT-Bench. The platform also introduces Mistral-embed with a 1024-dimension embedding model achieving 55.26 on MTEB. The API follows OpenAI-compatible chat interface specifications and is ramping toward general availability.

Frontier Model Releases Open Weights Progress MTEB Mistral AI MT-Bench +11 more