The role of software in science: a knowledge graph-based analysis of software mentions in PubMed Central

Author:

Schindler David1,Bensmann Felix2,Dietze Stefan23,Krüger Frank14ORCID

Affiliation:

1. Institute of Communications Engineering, University of Rostock, Rostock, Germany

2. GESIS - Leibniz Institute for the Social Sciences, Cologne, Germany

3. Heinrich-Heine-University, Düsseldorf, Germany

4. Department Knowledge, Culture & Transformation, University of Rostock, Rostock, Germany

Abstract

Science across all disciplines has become increasingly data-driven, leading to additional needs with respect to software for collecting, processing and analysing data. Thus, transparency about software used as part of the scientific process is crucial to understand provenance of individual research data and insights, is a prerequisite for reproducibility and can enable macro-analysis of the evolution of scientific methods over time. However, missing rigor in software citation practices renders the automated detection and disambiguation of software mentions a challenging problem. In this work, we provide a large-scale analysis of software usage and citation practices facilitated through an unprecedented knowledge graph of software mentions and affiliated metadata generated through supervised information extraction models trained on a unique gold standard corpus and applied to more than 3 million scientific articles. Our information extraction approach distinguishes different types of software and mentions, disambiguates mentions and outperforms the state-of-the-art significantly, leading to the most comprehensive corpus of 11.8 M software mentions that are described through a knowledge graph consisting of more than 300 M triples. Our analysis provides insights into the evolution of software usage and citation patterns across various fields, ranks of journals, and impact of publications. Whereas, to the best of our knowledge, this is the most comprehensive analysis of software use and citation at the time, all data and models are shared publicly to facilitate further research into scientific use and citation of software.

Funder

Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) SFB 1270/2

ScienceLinker

DFG

Publisher

PeerJ

Subject

General Computer Science

Reference55 articles.

1. Schroedinger’s code: a preliminary study on research source code availability and link persistence in astrophysics;Allen;The Astrophysical Journal Supplement Series,2018

2. DBpedia: a nucleus for a web of open data;Auer,2007

3. Informatics research artifacts ontology;Bach,2021

4. lxml: XML and HTML with Python;Behnel;GitHub,2005

5. SciBERT: a pretrained language model for scientific text;Beltagy,2019

Cited by 20 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. FAIRsoft—a practical implementation of FAIR principles for research software;Bioinformatics;2024-07-22

2. How do official software citation formats evolve over time? A longitudinal analysis of R programming language packages;Scientometrics;2024-06-14

3. FAIRe Gesundheitsdaten im nationalen und internationalen Datenraum;Bundesgesundheitsblatt - Gesundheitsforschung - Gesundheitsschutz;2024-05-15

4. Bidirectional Paper-Repository Tracing in Software Engineering;Proceedings of the 21st International Conference on Mining Software Repositories;2024-04-15

5. Special issue on software citation, indexing, and discoverability;PeerJ Computer Science;2024-03-26

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3