Method-level Bug Prediction: Problems and Promises

Author:

Chowdhury Shaiful1ORCID,Uddin Gias2ORCID,Hemmati Hadi2ORCID,Holmes Reid3ORCID

Affiliation:

1. University of Manitoba, Winnipeg, Canada

2. York University, Toronto, Canada

3. University of British Columbia, Vancouver, Canada

Abstract

Fixing software bugs can be colossally expensive, especially if they are discovered in the later phases of the software development life cycle. As such, bug prediction has been a classic problem for the research community. As of now, the Google Scholar site generates ∼113,000 hits if searched with the “bug prediction” phrase. Despite this staggering effort by the research community, bug prediction research is criticized for not being decisively adopted in practice. A significant problem of the existing research is the granularity level (i.e., class/file level) at which bug prediction is historically studied. Practitioners find it difficult and time-consuming to locate bugs at the class/file level granularity. Consequently, method-level bug prediction has become popular in the past decade. We ask, are these method-level bug prediction models ready for industry use? Unfortunately, the answer is no . The reported high accuracies of these models dwindle significantly if we evaluate them in different realistic time-sensitive contexts. It may seem hopeless at first, but, encouragingly, we show that future method-level bug prediction can be improved significantly. In general, we show how to reliably evaluate future method-level bug prediction models and how to improve them by focusing on four different improvement avenues: building noise-free bug data, addressing concept drift, selecting similar training projects, and developing a mixture of models. Our findings are based on three publicly available method-level bug datasets and a newly built bug dataset of 774,051 Java methods originating from 49 open-source software projects.

Funder

NSERC Alliance

Alberta Innovates CASBE Program

Eyes High Postdoctoral Match-Funding Program

Publisher

Association for Computing Machinery (ACM)

Reference124 articles.

1. Syed Ishtiaque Ahmad. 2021. Investigating the Impact of Methodological Choices on Source Code Maintenance Analyses. Master’s Thesis. University of British Columbia.

2. M. Alfadel, A. Kobilica, and J. Hassine. 2017. Evaluation of Halstead and cyclomatic complexity metrics in measuring defect density. In Proceedings of the 9th IEEE-GCC Conference and Exhibition. 1–9.

3. H. Alsolai, M. Roper, and D. Nassar. 2018. Predicting software maintainability in object-oriented systems using ensemble techniques. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution. 716–721.

4. T. L. Alves, C. Ypma, and J. Visser. 2010. Deriving metric thresholds from benchmark data. In Proceedings of the IEEE International Conference on Software Maintenance. 1–10.

5. Francesca Arcelli Fontana, Vincenzo Ferme, Marco Zanoni, and Aiko Yamashita. 2015. Automatic metric thresholds derivation for code smell detection. In Proceedings of the IEEE/ACM 6th International Workshop on Emerging Trends in Software Metrics. 44–53.

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3