benchmark
MetaSyn
benchmarkactiveprovisional
metasyn-fb26153d·1 events·first seen 26h agoAliases: MetaSyn
Co-occurring entities
More like this (12)
Recent events (1)
MetaSyn benchmark reveals critical screening bottleneck in LLM-based meta-analysis pipelines
Researchers introduce MetaSyn, a dataset of 442 expert-curated meta-analyses from Nature Portfolio journals, paired with a 140k-article PubMed retrieval corpus, PI/ECO criteria, verified positives, and hard negatives. Benchmarking twelve pipeline configurations — nine RAG variants and a protocol-driven agent — shows that despite 90.9% retrieval recall at K=200, no system recovers more than 52.7% of ground-truth included studies. The core failure is LLMs' inability to reliably distinguish eligible studies from topically similar but criteria-failing distractors. The paper argues that end-to-end scores obscure where pipelines break down and proposes stage-attributed metrics.