Abstract
AbstractCitation analysis generally assumes that each citation documents causal knowledge transfer that informed the conception, design, or execution of the main experiments. Citations may exist for other reasons. In this paper we identify a subset of citations that are unlikely to represent causal knowledge flow. Using a large, comprehensive feature set of open access data, we train a predictive model to identify such citations. The model relies only on the title, abstract, and reference set and not the full-text or future citations patterns, making it suitable for publications as soon as they are released, or those behind a paywall. We find that the model identifies, with high prediction scores, citations that were likely added during the peer review process, and conversely identifies with low prediction scores citations that are known to represent causal knowledge transfer. Using the model, we find that federally funded biomedical research publications represent 30% of the estimated causal knowledge transfer from basic studies to clinical research, even though these comprise only 10% of the literature, a three-fold overrepresentation in this important type of knowledge transfer. This finding underscores the importance of federal funding as a policy lever to improve human health.
Publisher
Cold Spring Harbor Laboratory