Computational reproducibility of Jupyter notebooks from biomedical publications-Reference-Cited by-同舟云学术

Computational reproducibility of Jupyter notebooks from biomedical publications

Published:2024 Issue: Volume:13 Page:
ISSN:2047-217X
Container-title:GigaScience
language:en
Short-container-title:

Author:

Samuel Sheeba¹²^ORCID,Mietchen Daniel³⁴⁵^ORCID

Affiliation:

1. Heinz-Nixdorf Chair for Distributed Information Systems, Friedrich Schiller University Jena , Jena 07743 , Germany

2. Michael Stifel Center Jena , Jena 07743 , Germany

3. Ronin Institute , Montclair 07043-2314, NJ , United States

4. Institute for Globally Distributed Open Research and Education (IGDORE)

5. FIZ Karlsruhe—Leibniz Institute for Information Infrastructure , Berlin 76344 , Germany

Abstract

Abstract Background Jupyter notebooks facilitate the bundling of executable code with its documentation and output in one interactive environment, and they represent a popular mechanism to document and share computational workflows, including for research publications. The reproducibility of computational aspects of research is a key component of scientific reproducibility but has not yet been assessed at scale for Jupyter notebooks associated with biomedical publications. Approach We address computational reproducibility at 2 levels: (i) using fully automated workflows, we analyzed the computational reproducibility of Jupyter notebooks associated with publications indexed in the biomedical literature repository PubMed Central. We identified such notebooks by mining the article’s full text, trying to locate them on GitHub, and attempting to rerun them in an environment as close to the original as possible. We documented reproduction success and exceptions and explored relationships between notebook reproducibility and variables related to the notebooks or publications. (ii) This study represents a reproducibility attempt in and of itself, using essentially the same methodology twice on PubMed Central over the course of 2 years, during which the corpus of Jupyter notebooks from articles indexed in PubMed Central has grown in a highly dynamic fashion. Results Out of 27,271 Jupyter notebooks from 2,660 GitHub repositories associated with 3,467 publications, 22,578 notebooks were written in Python, including 15,817 that had their dependencies declared in standard requirement files and that we attempted to rerun automatically. For 10,388 of these, all declared dependencies could be installed successfully, and we reran them to assess reproducibility. Of these, 1,203 notebooks ran through without any errors, including 879 that produced results identical to those reported in the original notebook and 324 for which our results differed from the originally reported ones. Running the other notebooks resulted in exceptions. Conclusions We zoom in on common problems and practices, highlight trends, and discuss potential improvements to Jupyter-related workflows associated with biomedical publications.

Funder

Alfred P. Sloan Foundation

Deutsche Forschungsgemeinschaft

Publisher

Oxford University Press (OUP)

Link

https://academic.oup.com/gigascience/article-pdf/doi/10.1093/gigascience/giad113/55444771/giad113.pdf

Reference158 articles.

1. Point of view: overflow in science and its implications for trust;Siebert;Elife,2015

2. Communication is central to the mission of science;Contera;Nat Rev Mater,2021

3. Understanding factors that influence stakeholder trust of natural resource science and institutions;Gray;Environm Manag,2012

4. Scientific rigor and credibility in the nutrition research landscape;Kroeger;Am J Clin Nutr,2018

5. Signaling the trustworthiness of science;Jamieson;Proc Natl Acad Sci,2019

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Distributed Collaboration for Data, Analysis Pipelines, and Results in Single-Cell Omics;2024-07-30

2. Best practices for data management and sharing in experimental biomedical research;Physiological Reviews;2024-07-01

3. Reproducible Research Practices in Magnetic Resonance Neuroimaging: A Review Informed by Advanced Language Models;Magnetic Resonance in Medical Sciences;2024

4. Balancing computational chemistry's potential with its environmental impact;Green Chemistry;2024