Affiliation:
1. Universität Passau , Chair of Distributed Information Systems , Passau , Germany
2. Universität Passau , Chair of Data Science , Passau , Germany
Abstract
Abstract
Data-centric disciplines like machine learning and data science have become major research areas within computer science and beyond. However, the development of research processes and tools did not keep pace with the rapid advancement of the disciplines, resulting in several insufficiently tackled challenges to attain reproducibility, replicability, and comparability of achieved results. In this discussion paper, we review existing tools, platforms and standardization efforts for addressing these challenges. As a common ground for our analysis, we develop an open science centred process model for machine learning research, which combines openness and transparency with the core processes of machine learning and data science. Based on the features of over 40 tools, platforms and standards, we list the, in our opinion, 11 most central platforms for the research process in this paper. We conclude that most platforms cover only parts of the requirements for overcoming the identified challenges.
Reference42 articles.
1. Martin Abadi, Andy Chu, Ian Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, CCS’16, pages 308–318. ACM, New York, NY, USA, 2016.
2. Michele Alberti, Vinaychandran Pondenkandath, Marcel Würsch, Rolf Ingold, and Marcus Liwicki. Deepdiva: A highly-functional python framework for reproducible experiments. CoRR, abs/1805.00329, 2018.
3. Ilkay Altintas, Chad Berkley, Efrat Jaeger, Matthew B. Jones, Bertram Ludäscher, and Steve Mock. Kepler: an extensible system for design and execution of scientific workflows. In Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004, pages 423–424, 2004.
4. Patrick Andreoli-Versbach and Frank Mueller-Langer. Open access to data: An ideal professed but not practised. Research Policy, 43(9):1621–1633, 2014.
5. Timothy G Armstrong, Alistair Moffat, William Webber, and Justin Zobel. Improvements that don’t add up: ad-hoc retrieval results since 1998. In Proceedings of the 18th ACM conference on Information and knowledge management, pages 601–610. ACM, 2009.
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. MLXP: A framework for conducting replicable experiments in Python;Proceedings of the 2nd ACM Conference on Reproducibility and Replicability;2024-06-18
2. A Large-Scale Study of ML-Related Python Projects;Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing;2024-04-08
3. Asset Management in Machine Learning: State-of-research and State-of-practice;ACM Computing Surveys;2022-12-15
4. EMMM: A Unified Meta-Model for Tracking Machine Learning Experiments;2022 48th Euromicro Conference on Software Engineering and Advanced Applications (SEAA);2022-08
5. On the effectiveness of machine learning experiment management tools;Proceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice;2022-05-21