Entity · technique

prompt sensitivity

techniqueactiveprompt-sensitivity-2add0c81·1 events·first seen May 22, 2026

Aliases: prompt sensitivity

Co-occurring entities

MTEB embedding model leaderboard instruction embedding models

More like this (12)

few-shot prompting human response time prompt-optimizer Prompt Caching prompt injection X-Sensitive embodiment-aware prompt conditioning prompt optimisation stance detection indirect prompt injection item response theory Cross-Event Prompt Fusion

Recent events (1)

6arXiv · cs.CL·May 22, 2026·source ↗

Instruction Sensitivity Undermines Embedding Model Evaluation: Single-Prompt Benchmarks Are Insufficient

This paper presents an empirical study of prompt sensitivity in instruction-tuned embedding models, covering 6 models, 11 datasets, and 15 task-specific prompts per dataset (990 total evaluations). The authors demonstrate that single-prompt evaluation systematically misrepresents true model performance, with default prompts both understating and overstating capabilities depending on phrasing. A key finding is that leaderboard rankings are not robust: by selecting prompts favorably, any model in the study can be promoted to first place. The authors recommend that benchmarks incorporate prompt robustness metrics, either through multi-prompt evaluation or by reporting sensitivity alongside point estimates.

Evaluation and Benchmarking Agent and Tool Ecosystem MTEB embedding model leaderboard prompt sensitivity +1 more