Open source and reproducible and inexpensive infrastructure for data challenges and education-Reference-Cited by-同舟云学术

Open source and reproducible and inexpensive infrastructure for data challenges and education

Published:2024-01-02 Issue:1 Volume:11 Page:
ISSN:2052-4463
Container-title:Scientific Data
language:en
Short-container-title:Sci Data

Author:

DeWitt Peter E.^ORCID,Rebull Margaret A.,Bennett Tellen D.

Abstract

AbstractData sharing is necessary to maximize the actionable knowledge generated from research data. Data challenges can encourage secondary analyses of datasets. Data challenges in biomedicine often rely on advanced cloud-based computing infrastructure and expensive industry partnerships. Examples include challenges that use Google Cloud virtual machines and the Sage Bionetworks Dream Challenges platform. Such robust infrastructures can be financially prohibitive for investigators without substantial resources. Given the potential to develop scientific and clinical knowledge and the NIH emphasis on data sharing and reuse, there is a need for inexpensive and computationally lightweight methods for data sharing and hosting data challenges. To fill that gap, we developed a workflow that allows for reproducible model training, testing, and evaluation. We leveraged public GitHub repositories, open-source computational languages, and Docker technology. In addition, we conducted a data challenge using the infrastructure we developed. In this manuscript, we report on the infrastructure, workflow, and data challenge results. The infrastructure and workflow are likely to be useful for data challenges and education.

Funder

U.S. Department of Health & Human Services | NIH | Eunice Kennedy Shriver National Institute of Child Health and Human Development

Publisher

Springer Science and Business Media LLC

Link

https://www.nature.com/articles/s41597-023-02854-0.pdf

Reference19 articles.

1. Wilkinson, M. D. et al. The fair guiding principles for scientific data management and stewardship. Scientific data 3, 1–9, https://doi.org/10.1038/sdata.2016.18 (2016).

2. National Institutes of Health. Fianl nih policy for data management and sharing and supplemental information. Notice 85 FR 68890 (2020).

3. Alday, E. A. P. et al. Classification of 12-lead ECGs: the PhysioNet/computing in cardiology challenge 2020. Physiological Measurement 41, 124003, https://doi.org/10.1088/1361-6579/abc960 (2021).

4. Faul, M., Wald, M. M., Xu, L. & Coronado, V. G. Traumatic brain injury in the united states; emergency department visits, hospitalizations, and deaths, 2002–2006. Centers for Disease Control and Prevention, National Center for Injury Prevention and Control (2010).

5. Anderson, V. A., Catroppa, C., Haritou, F., Morse, S. & Rosenfeld, J. V. Identifying factors contributing to child and family outcome 30 months after traumatic brain injury in children. Journal of Neurology, Neurosurgery & Psychiatry 76, 401–408, 10.1136/jnnp.2003.019174 https://jnnp.bmj.com/content/76/3/401.full.pdf (2005).