Author:
Barberis Alessandro,Aerts Hugo J. W. L.,Buffa Francesca M.
Abstract
AbstractArtificial intelligence (AI) techniques are increasingly applied across various domains, favoured by the growing acquisition and public availability of large, complex datasets. Despite this trend, AI publications often suffer from lack of reproducibility and poor generalisation of findings, undermining scientific value and contributing to global research waste. To address these issues and focusing on the learning aspect of the AI field, we present RENOIR (REpeated random sampliNg fOr machIne leaRning), a modular open-source platform for robust and reproducible machine learning (ML) analysis. RENOIR adopts standardised pipelines for model training and testing, introducing elements of novelty, such as the dependence of the performance of the algorithm on the sample size. Additionally, RENOIR offers automated generation of transparent and usable reports, aiming to enhance the quality and reproducibility of AI studies. To demonstrate the versatility of our tool, we applied it to benchmark datasets from health, computer science, and STEM (Science, Technology, Engineering, and Mathematics) domains. Furthermore, we showcase RENOIR’s successful application in recently published studies, where it identified classifiers for SET2D and TP53 mutation status in cancer. Finally, we present a use case where RENOIR was employed to address a significant pharmacological challenge—predicting drug efficacy. RENOIR is freely available at https://github.com/alebarberis/renoir.
Funder
Cancer Research UK
Prostate Cancer UK
European Research Council
Publisher
Springer Science and Business Media LLC
Reference32 articles.
1. Stephens, Z. D. et al. Big data: Astronomical or genomical?. PLoS Biol. 13, e1002195 (2015).
2. Marx, V. The big challenges of big data. Nature 498, 255–260 (2013).
3. Hornby, A. S., Deuter, M., Turnbull, J. & Bradbury, J. Oxford Advanced Learner’s Dictionary of Current English (Oxford University Press, 2015).
4. Begley, C. G. & Ellis, L. M. Raise standards for preclinical cancer research. Nature 483, 531–533 (2012).
5. Stupple, A., Singerman, D. & Celi, L. A. The reproducibility crisis in the age of digital medicine. Digit. Med. 2, 1–3 (2019).
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献