An Exploratory Study on Machine Learning Model Management-Reference-Cited by-同舟云学术

An Exploratory Study on Machine Learning Model Management

Published:2024-08-16 Issue: Volume: Page:
ISSN:1049-331X
Container-title:ACM Transactions on Software Engineering and Methodology
language:en
Short-container-title:ACM Trans. Softw. Eng. Methodol.

Author:

Latendresse Jasmine¹^ORCID,Abedu Samuel¹^ORCID,Abdellatif Ahmad²^ORCID,Shihab Emad¹^ORCID

Affiliation:

1. Concordia University, Canada

2. University of Calgary, Canada

Abstract

Effective model management is crucial for ensuring performance and reliability in Machine Learning (ML) systems, given the dynamic nature of data and operational environments. However, standard practices are lacking, often resulting in ad hoc approaches. To address this, our research provides a clear definition of ML model management activities, processes, and techniques. Analyzing 227 ML repositories, we propose a taxonomy of 16 model management activities and identify 12 unique challenges. We find that 57.9% of the identified activities belong to the maintenance category, with activities like refactoring (20.5%) and documentation (18.3%) dominating. Our findings also reveal significant challenges in documentation maintenance (15.3%) and bug management (14.9%), emphasizing the need for robust versioning tools and practices in the ML pipeline. Additionally, we conducted a survey that underscores a shift towards automation, particularly in data, model, and documentation versioning, as key to managing ML models effectively. Our contributions include a detailed taxonomy of model management activities, a mapping of challenges to these activities, practitioner-informed solutions for challenge mitigation, and a publicly available dataset of model management activities and challenges. This work aims to equip ML developers with knowledge and best practices essential for the robust management of ML models.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3688841

Reference82 articles.

1. D. Gonzalez, T. Zimmermann, and N. Nagappan, “The state of the ml-universe: 10 years of artificial intelligence & machine learning software development on github,” in Proceedings of the 17th International Conference on Mining Software Repositories, 2020, pp. 431–442.

2. S. Amershi, A. Begel, C. Bird, R. DeLine, H. Gall, E. Kamar, N. Nagappan, B. Nushi, and T. Zimmermann, “Software engineering for machine learning: A case study,” in 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). IEEE, 2019, pp. 291–300.

3. Understanding Software-2.0

4. A. Paleyes, R.-G. Urma, and N. D. Lawrence, “Challenges in deploying machine learning: A survey of case studies,” ACM Comput. Surv., 2022.

5. M. Vartak, H. Subramanyam, W.-E. Lee, S. Viswanathan, S. Husnoo, S. Madden, and M. Zaharia, “Modeldb: a system for machine learning model management,” in Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2016, pp. 1–3.