ReCiter: An open source, identity-driven, authorship prediction algorithm optimized for academic institutions-Reference-Cited by-同舟云学术

ReCiter: An open source, identity-driven, authorship prediction algorithm optimized for academic institutions

Published:2021-04-01 Issue:4 Volume:16 Page:e0244641
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Albert Paul J.^ORCID,Dutta Sarbajit^ORCID,Lin Jie,Zhu Zimeng,Bales Michael^ORCID,Johnson Stephen B.,Mansour Mohammad,Wright Drew,Wheeler Terrie R.,Cole Curtis L.

Abstract

Academic institutions need to maintain publication lists for thousands of faculty and other scholars. Automated tools are essential to minimize the need for direct feedback from the scholars themselves who are practically unable to commit necessary effort to keep the data accurate. In relying exclusively on clustering techniques, author disambiguation applications fail to satisfy key use cases of academic institutions. Algorithms can perfectly group together a set of publications authored by a common individual, but, for them to be useful to an academic institution, they need to programmatically and recurrently map articles to thousands of scholars of interest en masse. Consistent with a savvy librarian’s approach for generating a scholar’s list of publications, identity-driven authorship prediction is the process of using information about a scholar to quantify the likelihood that person wrote certain articles. ReCiter is an application that attempts to do exactly that. ReCiter uses institutionally-maintained identity data such as name of department and year of terminal degree to predict which articles a given scholar has authored. To compute the overall score for a given candidate article from PubMed (and, optionally, Scopus), ReCiter uses: up to 12 types of commonly available, identity data; whether other members of a cluster have been accepted or rejected by a user; and the average score of a cluster. In addition, ReCiter provides scoring and qualitative evidence supporting why particular articles are suggested. This context and confidence scoring allows curators to more accurately provide feedback on behalf of scholars. To help users to more efficiently curate publication lists, we used a support vector machine analysis to optimize the scoring of the ReCiter algorithm. In our analysis of a diverse test group of 500 scholars at an academic private medical center, ReCiter correctly predicted 98% of their publications in PubMed.

Funder

National Center For Advancing Translational Sciences

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference33 articles.

1. Giles CL, Zha H, Han H. Name disambiguation in author citations using a k-way spectral clustering method. Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL ‘05). 2005;

2. Author name disambiguation for pubmed.;W Liu;J Assoc Inf Sci Technol,2014

3. Han H, Giles L, Zha H, Li C, Tsioutsiouliklis K. Two supervised learning approaches for name disambiguation in author citations. Proceedings of the 2004 joint ACM/IEEE conference on Digital libraries—JCDL ‘04. New York, New York, USA: ACM Press; 2004. p. 296. doi: 10.1145/996350.996419

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. ANDez: An open-source tool for author name disambiguation using machine learning;SoftwareX;2024-05

2. ORCID coverage in research institutions—Readiness for partially automated research reporting;Frontiers in Research Metrics and Analytics;2022-11-10

3. Transforming and extending library services by embracing technology and collaborations: A case study;Health Information & Libraries Journal;2022-06-22

4. TeamTree analysis: A new approach to evaluate scientific production;PLOS ONE;2021-07-21

5. TeamTree analysis: a new approach to evaluate scientific production;2020-06-02