Almanac
paper

Automated reproducibility assessments in the social and behavioral sciences using large language models

paperactiveprovisionalautomated-reproducibility-assessments-in-the-social-and-behavioral-sciences-using-large-language-models-6323e3a8·1 events·first seen 5d ago

Aliases: Automated reproducibility assessments in the social and behavioral sciences using large language models

More like this (12)

Recent events (1)

6arXiv · cs.AI·5d ago·source ↗

LLMs automate reproducibility assessments in social and behavioral sciences, outperforming human reanalysts

A preprint from arXiv demonstrates that an LLM pipeline can automate reproducibility assessments of published social and behavioral science studies, recovering original effect sizes in 41% of cases (vs. 34% for human reanalysts) and reaching the same qualitative conclusion in 96% of cases (vs. 74% for humans). The study evaluated 76 published studies with predefined claims. The results suggest LLMs could serve as a scalable tool for systematic auditing of empirical research, addressing the resource-intensive nature of traditional reproducibility efforts.