dataset
SciTraj
datasetactiveprovisional
scitraj-1dd35eec·1 events·first seen 47h agoAliases: SciTraj
Co-occurring entities
More like this (12)
Recent events (1)
SciTraj: Claim-grounded typed citation graph for tracing research trajectories in NLP, ML, and CV
Researchers introduce SciTraj, a corpus of 32,559 papers from NLP, ML, and Computer Vision (2015–2024) connected by 573,126 directed typed citation edges, where each edge is grounded to the specific claim sentence motivating the citation. Six relation types (four NLI-verified, two similarity-gated) capture how papers extend, dispute, or realize prior work, going beyond homogeneous citation graphs. The corpus includes 287M typed trajectories and a temporally split link-prediction benchmark, enabling analysis of disciplinary siloing and topic emergence patterns. Findings highlight concentrated growth in Vision and LLM-related work.