Predicting article quality scores with machine learning: The U.K. Research Excellence Framework-Reference-Cited by-同舟云学术

Predicting article quality scores with machine learning: The U.K. Research Excellence Framework

Published:2023 Issue:2 Volume:4 Page:547-573
ISSN:2641-3337
Container-title:Quantitative Science Studies
language:en
Short-container-title:

Author:

Thelwall Mike¹^ORCID,Kousha Kayvan¹^ORCID,Wilson Paul¹^ORCID,Makita Meiko¹^ORCID,Abdoli Mahshid¹^ORCID,Stuart Emma¹^ORCID,Levitt Jonathan¹^ORCID,Knoth Petr²^ORCID,Cancellieri Matteo²^ORCID

Affiliation:

1. Statistical Cybermetrics and Research Evaluation Group, University of Wolverhampton, Wolverhampton, UK

2. Knowledge Media Institute, Open University, Milton Keynes, UK

Abstract

AbstractNational research evaluation initiatives and incentive schemes choose between simplistic quantitative indicators and time-consuming peer/expert review, sometimes supported by bibliometrics. Here we assess whether machine learning could provide a third alternative, estimating article quality using more multiple bibliometric and metadata inputs. We investigated this using provisional three-level REF2021 peer review scores for 84,966 articles submitted to the U.K. Research Excellence Framework 2021, matching a Scopus record 2014–18 and with a substantial abstract. We found that accuracy is highest in the medical and physical sciences Units of Assessment (UoAs) and economics, reaching 42% above the baseline (72% overall) in the best case. This is based on 1,000 bibliometric inputs and half of the articles used for training in each UoA. Prediction accuracies above the baseline for the social science, mathematics, engineering, arts, and humanities UoAs were much lower or close to zero. The Random Forest Classifier (standard or ordinal) and Extreme Gradient Boosting Classifier algorithms performed best from the 32 tested. Accuracy was lower if UoAs were merged or replaced by Scopus broad categories. We increased accuracy with an active learning strategy and by selecting articles with higher prediction probabilities, but this substantially reduced the number of scores predicted.

Funder

Research England, Scottish Funding Council, Higher Education Funding Council for Wales, and Department for the Economy, Northern Ireland

Publisher

MIT Press

Subject

Library and Information Sciences,Cultural Studies,Numerical Analysis,Analysis

Link

https://direct.mit.edu/qss/article-pdf/4/2/547/2136363/qss_a_00258.pdf

Reference72 articles.

1. Are the authors of highly cited articles also the most productive ones?;Abramo;Journal of Informetrics,2014

2. Predicting citation counts based on deep neural network learning techniques;Abrishami;Journal of Informetrics,2019

3. Early indicators of scientific impact: Predicting citations with altmetrics;Akella;Journal of Informetrics,2021

4. SciBERT: A pretrained language model for scientific text;Beltagy,2019

5. The Matthew effect in science funding;Bol;Proceedings of the National Academy of Sciences,2018

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Two circuit assessments of the performance of scientific organizations in Russia: current state and development prospects from the point of view of international experience;Вестник Российской академии наук;2024-07-04

2. Content-based quality evaluation of scientific papers using coarse feature and knowledge entity network;Journal of King Saud University - Computer and Information Sciences;2024-07

3. Relationships between expert ratings of business/economics journals and key citation metrics: The impact of size-independence, citing-journal weighting, and subject-area normalization;The Journal of Academic Librarianship;2024-07

4. British education research and its quality: An analysis of Research Excellence Framework submissions;British Educational Research Journal;2024-06-05

5. Electricity Production Prediction by Microsoft Azure Machine Learning Service and Python User Blocks;Advances in Environmental Engineering and Green Technologies;2024-05-17