Almanac
organization

PubMed Central

organizationactiveprovisionalpubmed-central-d4449b5c·1 events·first seen 4d ago

Aliases: PubMed Central

Co-occurring entities

More like this (12)

Recent events (1)

4arXiv · cs.CL·4d ago·source ↗

Quality-aware training data selection improves scientific summarization via 1.88M PMC article dataset

Researchers construct and release one of the largest biomedical long-document summarization datasets, comprising 1.88 million PMC articles, and analyze the quality of author-written abstracts as gold reference summaries. Using source-grounded and model-based metrics, they show that abstract quality varies substantially and that training on high-quality subsets outperforms random sampling at matched sizes. The work demonstrates that quality-aware data selection can match or exceed larger random training sets on factuality-oriented metrics, suggesting reference quality is a key lever for training efficiency in scientific summarization.