Almanac
technique

Cranfield evaluation paradigm

techniqueactiveprovisionalcranfield-evaluation-paradigm-5e8f4a0d·1 events·first seen 16d ago

Aliases: Cranfield evaluation paradigm

Co-occurring entities

More like this (12)

Recent events (1)

4arXiv · cs.AI·16d ago·source ↗

SPECTRA: Synthetic IR Test Collections with Relevance Oracles and Controlled Distractor Diagnostics

SPECTRA is a reproducible framework for generating synthetic information retrieval test collections, separating latent topical structure, surface text realization, and query intent generation to produce deterministic relevance oracles without human annotation. A Python prototype generated corpora up to 60,000 documents at roughly 12K–14K documents per second, with graded relevance labels for 96 queries. Controlled distractor experiments showed BM25 nDCG@10 degrading from 1.00 at 2% distractors to 0.43 at 36%, demonstrating the framework's utility for exposing retrieval system failure modes before expensive real-world collection construction. The authors position SPECTRA as a diagnostic complement to Cranfield/TREC-style evaluation rather than a replacement for human judgment.