Affiliation:
1. University of Economics and Technical Industries
2. Posts and Telecommunications Institute of Technology Hanoi
Abstract
Abstract
The detection of software vulnerabilities written in C and C++ languages takes a lot of attention and interest today. This paper proposes a new framework called DrCSE to improve software vulnerability detection. It uses an intelligent computation tech-nique based on the combination of two methods: rebalancing data and representation learning to analyze and evaluate the code property graph (CPG) of the source code for detecting abnormal behavior of software vulnerabilities. To do that, DrCSE performs a combination of 3 main processing techniques: i) building the source code feature profiles, ii) rebalancing data, and iii) contrastive learning. In which, the method i) extracts the source code’s features based on the vertices and edges of the CPG. The method of rebalancing data has the function of supporting the training process by balancing the experimental dataset. Finally, contrastive learning techniques learn the important features of the source code by finding and pulling similar ones together while pushing the outliers away. The experiment part of this paper demonstrates the superiority of the DrCSE Framework for detecting source code security vulnerabil-ities using the Verum dataset. As a result, the method proposed in the article has brought a pretty good performance in all metrics, especially the Precision and Recall scores of 39.35% and 69.07%, respectively, proving the efficiency of the DrCSE Framework. It performs better than other approaches, with a 5% boost in Precision and a 5% boost in Recall. Overall, this is considered the best research result for the software vulnerability detection problem using the Verum dataset according to our survey to date
Publisher
Research Square Platform LLC