Abstract
AbstractSoftware Defect Detection (SDD) has always been critical to the development life cycle. A stable defect detection system can not only alleviate the workload of software testers but also enhance the overall efficiency of software development. Researchers have recently proposed various artificial intelligence-based SDD methods and achieved significant advancements. However, these methods still exhibit limitations in terms of reliability and usability. Therefore, we introduce MSDD-(IA)3, a novel framework leveraging the pre-trained CodeT5+ and (IA)3 for parameter-efficient multi-classification SDD. This framework constructs a detection model based on pre-trained CodeT5+ to generate code representations while capturing defect-prone features. Considering the high overhead of pre-trained LLMs, we injects (IA)3 vectors into specific layers, where only these injected parameters are updated to reduce the training cost. Furthermore, leveraging the properties of the pre-trained CodeT5+, we design a novel feature sequence that enriches the input data through the combination of source code with Natural Language (NL)-based expert metrics. Our experimental results on 64K real-world Python snippets show that MSDD-(IA)3 demonstrates superior performance compared to state-of-the-art SDD methods, including PM2-CNN, in terms of F1-weighted, Recall-weighted, Precision-weighted, and Matthews Correlation Coefficient. Notably, the training parameters of MSDD-(IA)3 are only 0.04% of those of the original CodeT5+. Our experimental data and code can be available at (https://gitee.com/wxyzjp123/msdd-ia3/).
Funder
the Key Field Research and Development Plan of Guangdong 606 Province
the second batch of cultivation projects of Pazhou Laboratory
Publisher
Springer Science and Business Media LLC
Reference42 articles.
1. Yang, P., Zhu, L., Zhang, Y., Ma, C., Liu, L., Yu, X., Hu, W.: On the relative value of clustering techniques for unsupervised effort-aware defect prediction. Expert Systems with Applications, p. 123041 (2023)
2. Zhang, D.: Applying machine learning algorithms in software development. In: Proceedings of the 2000 Monterey Workshop on Modeling Software System Structures in a Fastly Moving Scenario, pp. 275–291 (2000)
3. Okutan, A., Yıldız, O.T.: Software defect prediction using Bayesian networks. Empir. Softw. Eng. 19, 154–181 (2014)
4. Zhang, X., Ben, K., Zeng, J.: Cross-entropy: A new metric for software defect prediction. In: 2018 IEEE International Conference on Software Quality, Reliability and Security (QRS). pp. 111–122. IEEE (2018)
5. Wang, S., Liu, T., Tan, L.: Automatically learning semantic features for defect prediction. In: Proceedings of the 38th International Conference on Software Engineering, pp. 297–308 (2016)