Entity · benchmark

embedding model leaderboard

benchmarkactiveembedding-model-leaderboard-ca9c7a6f·1 events·first seen May 22, 2026

Aliases: embedding model leaderboard

Co-occurring entities

MTEB prompt sensitivity instruction embedding models

More like this (12)

Arena Leaderboard Object Detection Leaderboard AI leaderboards embedding models LiveCodeBench Leaderboard Open Agent Leaderboard Open ASR Leaderboard Artificial Analysis Intelligence Leaderboard Open Chain of Thought Leaderboard reward model Open Medical-LLM Leaderboard LMSys Vision Leaderboard

Recent events (1)

6arXiv · cs.CL·May 22, 2026·source ↗

Instruction Sensitivity Undermines Embedding Model Evaluation: Single-Prompt Benchmarks Are Insufficient

This paper presents an empirical study of prompt sensitivity in instruction-tuned embedding models, covering 6 models, 11 datasets, and 15 task-specific prompts per dataset (990 total evaluations). The authors demonstrate that single-prompt evaluation systematically misrepresents true model performance, with default prompts both understating and overstating capabilities depending on phrasing. A key finding is that leaderboard rankings are not robust: by selecting prompts favorably, any model in the study can be promoted to first place. The authors recommend that benchmarks incorporate prompt robustness metrics, either through multi-prompt evaluation or by reporting sensitivity alongside point estimates.

Evaluation and Benchmarking Agent and Tool Ecosystem MTEB embedding model leaderboard prompt sensitivity +1 more