Improve Classification of Security Bug Reports using fasttext. A Machine Learning Based Approach-Reference-Cited by-同舟云学术

Improve Classification of Security Bug Reports using fasttext. A Machine Learning Based Approach

Published:2022-11-15 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Alqahtani Sultan S.¹

Affiliation:

1. Al-Imam Mohammad Ibn Saud Islamic University

Abstract

Abstract Software developers must handle security bug reports (SBRs) before they are widely disclosed, and the system becomes vulnerable to hackers. Bug tracking systems may contain many securities-related reports which are unlabelled as SBRs. Therefore, finding unlabelled SBRs is a challenge to help security engineers identify these security issues fast and accurately. Although many methods have been proposed for classifying SBRs, challenging issues remain due to selecting an accurate and high-performance classification algorithm. This motivates us to tackle the challenges faced by the state-of-the-art SBRs classification methods by selecting a high-performance machine learning algorithm. Therefore, the main goal of this paper is to automate the process of determining which bug report can be labeled as SBR through the use of machine learning techniques. We first extracted 45,940 bug reports from publicly available datasets of five software repositories (e.g., the work of Peters et al. and Shu et al.). Second, we conducted a study on the classification of SBRs using machine learning, where we built a fasttext classifier. We then examined the accuracy of using fasttext in detecting SBRs. Our results show that fasttext can identify SBRs with an average F1 score of 0.81. Furthermore, we investigated the generalizability of identifying SBRs by applying cross-project validation, and our results show that the fasttext classifier achieves an average F1 value of 0.65. Data and results are available at https://github.com/isultane/fasttext_classifications.

Publisher

Research Square Platform LLC

Reference35 articles.

1. Floris, P., & Vogt Harald, H. (2010). “How to save on software maintenance costs, omnext white pape. ” vol. SOURCE 2 V.

2. Rui, S., Tianpei, X., Laurie, W., & Tim, M. (2019). “Better Security Bug Report Classification via Hyperparameter Optimization,” i>https://arxiv.org/pdf/1905.06872.pdf,

3. Chawla, I., & Singh, S. K. (2014). “Automatic bug labeling using semantic information from LSI,” in Seventh International Conference on Contemporary Computing (IC3), Aug. 2014, pp. 376–381, doi: 10.1109/IC3.2014.6897203.

4. Bozorgi, M., Saul, L. K., Savage, S., & Voelker, G. M. (2010). “Beyond heuristics: learning to classify vulnerabilities and predict exploits,” in Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’10, p. 105, doi: 10.1145/1835804.1835821.

5. Peters, F., Tun, T. T., Yu, Y., & Nuseibeh, B. (2019). “Text Filtering and Ranking for Security Bug Report Prediction,” IEEE Trans. Softw. Eng., vol. 45, no. 6, pp. 615–631, Jun. doi: 10.1109/TSE.2017.2787653.