Parameter-Efficient Multi-classification Software Defect Detection Method Based on Pre-trained LLMs-Reference-Cited by-同舟云学术

Parameter-Efficient Multi-classification Software Defect Detection Method Based on Pre-trained LLMs

Published:2024-06-19 Issue:1 Volume:17 Page:
ISSN:1875-6883
Container-title:International Journal of Computational Intelligence Systems
language:en
Short-container-title:Int J Comput Intell Syst

Author:

Wang Xuanye,Lu Lu^ORCID,Yang Zhanyu,Tian Qingyan,Lin Haisha

Abstract

AbstractSoftware Defect Detection (SDD) has always been critical to the development life cycle. A stable defect detection system can not only alleviate the workload of software testers but also enhance the overall efficiency of software development. Researchers have recently proposed various artificial intelligence-based SDD methods and achieved significant advancements. However, these methods still exhibit limitations in terms of reliability and usability. Therefore, we introduce MSDD-(IA)3, a novel framework leveraging the pre-trained CodeT5+ and (IA)3 for parameter-efficient multi-classification SDD. This framework constructs a detection model based on pre-trained CodeT5+ to generate code representations while capturing defect-prone features. Considering the high overhead of pre-trained LLMs, we injects (IA)3 vectors into specific layers, where only these injected parameters are updated to reduce the training cost. Furthermore, leveraging the properties of the pre-trained CodeT5+, we design a novel feature sequence that enriches the input data through the combination of source code with Natural Language (NL)-based expert metrics. Our experimental results on 64K real-world Python snippets show that MSDD-(IA)3 demonstrates superior performance compared to state-of-the-art SDD methods, including PM2-CNN, in terms of F1-weighted, Recall-weighted, Precision-weighted, and Matthews Correlation Coefficient. Notably, the training parameters of MSDD-(IA)3 are only 0.04% of those of the original CodeT5+. Our experimental data and code can be available at (https://gitee.com/wxyzjp123/msdd-ia3/).

Funder

the Key Field Research and Development Plan of Guangdong 606 Province

the second batch of cultivation projects of Pazhou Laboratory

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s44196-024-00551-3.pdf

Reference42 articles.

1. Yang, P., Zhu, L., Zhang, Y., Ma, C., Liu, L., Yu, X., Hu, W.: On the relative value of clustering techniques for unsupervised effort-aware defect prediction. Expert Systems with Applications, p. 123041 (2023)

2. Zhang, D.: Applying machine learning algorithms in software development. In: Proceedings of the 2000 Monterey Workshop on Modeling Software System Structures in a Fastly Moving Scenario, pp. 275–291 (2000)

3. Okutan, A., Yıldız, O.T.: Software defect prediction using Bayesian networks. Empir. Softw. Eng. 19, 154–181 (2014)

4. Zhang, X., Ben, K., Zeng, J.: Cross-entropy: A new metric for software defect prediction. In: 2018 IEEE International Conference on Software Quality, Reliability and Security (QRS). pp. 111–122. IEEE (2018)

5. Wang, S., Liu, T., Tan, L.: Automatically learning semantic features for defect prediction. In: Proceedings of the 38th International Conference on Software Engineering, pp. 297–308 (2016)