Abstract
Upon receiving a new issue report, practitioners start by investigating the defect type, the potential fixing effort needed to resolve the defect and the change impact. Moreover, issue reports contain valuable information, such as, the title, description and severity, and researchers leverage the topics of issue reports as a collective metric portraying similar characteristics of a defect. Nonetheless, none of the existing studies leverage the defect topic, i.e., a semantic cluster of defects of the same nature, such as
Performance, GUI, and Database
, to estimate the change impact that represents the amount of change needed in terms of code churn and the number of files changed. To this end, in this article, we conduct an empirical study on 298,548 issue reports belonging to three large-scale open-source systems, i.e., Mozilla, Apache, and Eclipse, to estimate the change impact in terms of code churn or the number of files changed while leveraging the topics of issue reports. First, we adopt the Embedded Topic Model (ETM), a state-of-the-art topic modelling algorithm, to identify the topics. Second, we investigate the feasibility of predicting the change impact using the identified topics and other information extracted from the issue reports by building eight prediction models that classify issue reports requiring small or large change impact along two dimensions, i.e., the code churn size and the number of files changed. Our results suggest that XGBoost is the best-performing algorithm for predicting the change impact, with an AUC of 0.84, 0.76, and 0.73 for the code churn and 0.82, 0.71, and 0.73 for the number of files changed metric for Mozilla, Apache, and Eclipse, respectively. Our results also demonstrate that the topics of issue reports improve the recall of the prediction model by up to 45%.
Publisher
Association for Computing Machinery (ACM)
Reference133 articles.
1. [n.d.]. Cambridge University Study States Software Bugs Cost Economy $312 Billion Per Year. Retrieved October 18 2022 from https://www.prweb.com/releases/2013/1/prweb10298185.htm/.
2. [n.d.]. This is what your developers are doing 75% of the time and this is the cost you pay. Retrieved June 17 2021 from https://coralogix.com/log-analytics-blog/this-is-what-your-developers-are-doing-75-of-the-time-and-this-is-the-cost-you-pay/.
3. Taxonomy of C Overflow Vulnerabilities Attack
4. CaPBug-A Framework for Automatic Bug Categorization and Prioritization Using NLP and Machine Learning Algorithms
5. Prioritizing lingering bugs
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献