Entity · technique

Cranfield evaluation paradigm

techniqueactivecranfield-evaluation-paradigm-5e8f4a0d·1 events·first seen Jun 1, 2026

Aliases: Cranfield evaluation paradigm

Co-occurring entities

TREC Zipf distribution BM25 nDCG@10 SPECTRA

More like this (12)

STAGE-Eval G-Eval PoPE (Popperian Placebo-controlled Evaluation)Evaluation on the Hub L-Eval Plausibility Evaluation frontier model evaluation T-Eval wet lab biological research evaluation framework Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting HypoEval ARC Evals

Recent events (1)

4arXiv · cs.AI·Jun 1, 2026·source ↗

SPECTRA: Synthetic IR Test Collections with Relevance Oracles and Controlled Distractor Diagnostics

SPECTRA is a reproducible framework for generating synthetic information retrieval test collections, separating latent topical structure, surface text realization, and query intent generation to produce deterministic relevance oracles without human annotation. A Python prototype generated corpora up to 60,000 documents at roughly 12K–14K documents per second, with graded relevance labels for 96 queries. Controlled distractor experiments showed BM25 nDCG@10 degrading from 1.00 at 2% distractors to 0.43 at 36%, demonstrating the framework's utility for exposing retrieval system failure modes before expensive real-world collection construction. The authors position SPECTRA as a diagnostic complement to Cranfield/TREC-style evaluation rather than a replacement for human judgment.

Evaluation and Benchmarking Agent and Tool Ecosystem TREC Cranfield evaluation paradigm Zipf distribution +3 more