A comprehensive analysis on software vulnerability detection datasets: trends, challenges, and road ahead
-
Published:2024-07-23
Issue:5
Volume:23
Page:3311-3327
-
ISSN:1615-5262
-
Container-title:International Journal of Information Security
-
language:en
-
Short-container-title:Int. J. Inf. Secur.
Author:
Guo Yuejun,Bettaieb Seifeddine,Casino Fran
Abstract
AbstractAs society’s dependence on information and communication systems (ICTs) grows, so does the necessity of guaranteeing the proper functioning and use of such systems. In this context, it is critical to enhance the security and robustness of the DevSecOps pipeline through timely vulnerability detection. Usually, AI-based models enable desirable features such as automation, performance, and efficacy. However, the quality of such models highly depends on the datasets used during the training stage. The latter encompasses a series of challenges yet to be solved, such as access to extensive labelled datasets with specific properties, such as well-represented and balanced samples. This article explores the current state of practice of software vulnerability datasets and provides a classification of the main challenges and issues. After an extensive analysis, it describes a set of guidelines and desirable features that datasets should guarantee. The latter is applied to create a new dataset, which fulfils these properties, along with a descriptive comparison with the state of the art. Finally, a discussion on how to foster good practices among researchers and practitioners sets the ground for further research and continued improvement within this critical domain.
Funder
Universitat Rovira i Virgili
Publisher
Springer Science and Business Media LLC
Reference73 articles.
1. Aslan, Ö., Aktuğ, S.S., Ozkan-Okay, M., Yilmaz, A.A., Akin, E.: A comprehensive review of cyber security vulnerabilities, threats, attacks, and solutions. Electronics 12(6), 1333 (2023). https://doi.org/10.3390/electronics12061333 2. Casino, F., Dasaklis, T.K., Spathoulas, G.P., Anagnostopoulos, M., Ghosal, A., Borocz, I., Solanas, A., Conti, M., Patsakis, C.: Research trends, challenges, and emerging topics in digital forensics: A review of reviews. IEEE Access 10, 25464–25493 (2022) 3. SecurityScorecard. CVE vulnerabilities by year. https://www.cvedetails.com/browse-by-date.php . Accessed on January 30th, 2024 (2024) 4. Lee, M., Cho, S., Jang, C., Park, H., Choi, E.: In International Conference on Hybrid Information Technology, vol. 2, pp. 505–512. (2006) https://doi.org/10.1109/ICHIT.2006.253653 5. Senanayake, J., Kalutarage, H., Al-Kadri, M.O., Petrovski, A., Piras, L.: Android source code vulnerability detection: a systematic literature review. ACM Comput. Surv. 55(9), 1–37 (2023). https://doi.org/10.1145/3556974
|
|