Understanding Software-2.0

Author:

Dilhara Malinda1,Ketkar Ameya1,Dig Danny1

Affiliation:

1. Oregon State University and University of Colorado

Abstract

Enabled by a rich ecosystem of Machine Learning (ML) libraries, programming using learned models , i.e., Software-2.0 , has gained substantial adoption. However, we do not know what challenges developers encounter when they use ML libraries. With this knowledge gap, researchers miss opportunities to contribute to new research directions, tool builders do not invest resources where automation is most needed, library designers cannot make informed decisions when releasing ML library versions, and developers fail to use common practices when using ML libraries. We present the first large-scale quantitative and qualitative empirical study to shed light on how developers in Software-2.0 use ML libraries, and how this evolution affects their code. Particularly, using static analysis we perform a longitudinal study of 3,340 top-rated open-source projects with 46,110 contributors. To further understand the challenges of ML library evolution, we survey 109 developers who introduce and evolve ML libraries. Using this rich dataset we reveal several novel findings. Among others, we found an increasing trend of using ML libraries: The ratio of new Python projects that use ML libraries increased from 2% in 2013 to 50% in 2018. We identify several usage patterns including the following: (i) 36% of the projects use multiple ML libraries to implement various stages of the ML workflows, (ii) developers update ML libraries more often than the traditional libraries , (iii) strict upgrades are the most popular for ML libraries among other update kinds, (iv) ML library updates often result in cascading library updates, and (v) ML libraries are often downgraded (22.04% of cases). We also observed unique challenges when evolving and maintaining Software-2.0 such as (i) binary incompatibility of trained ML models and (ii) benchmarking ML models. Finally, we present actionable implications of our findings for researchers, tool builders, developers, educators, library vendors, and hardware vendors.

Funder

NSF

Publisher

Association for Computing Machinery (ACM)

Subject

Software

Cited by 25 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Characterizing Deep Learning Package Supply Chains in PyPI: Domains, Clusters, and Disengagement;ACM Transactions on Software Engineering and Methodology;2024-01-10

2. Just‐in‐time identification for cross‐project correlated issues;Journal of Software: Evolution and Process;2023-12-26

3. On the sustainability of deep learning projects: Maintainers' perspective;Journal of Software: Evolution and Process;2023-12-13

4. Can Machine Learning Pipelines Be Better Configured?;Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering;2023-11-30

5. From big data to big insights: statistical and bioinformatic approaches for exploring the lipidome;Analytical and Bioanalytical Chemistry;2023-10-25

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3