NG_MDERANK: A software vulnerability feature knowledge extraction method based on N‐gram similarity

Author:

Wu Xiaoxue1ORCID,Weng Shiyu1,Zheng Bin2,Zheng Wei2,Chen Xiang3,Sun Xiaobin1

Affiliation:

1. School of Information Engineering Yangzhou University Yangzhou China

2. School of Software Northwestern Polytechnical University Xi'an China

3. School of Information Science and Technology Nantong University Nantong China

Abstract

AbstractAs software grows in size and complexity, software vulnerabilities are increasing, leading to a range of serious insecurity issues. Open‐source software vulnerability reports and documentation can provide researchers with great convenience for analysis and detection. However, the quality of different data sources varies, the data are duplicated and lack of correlation, which often requires a lot of manual management and analysis. In order to solve the problems of scattered and heterogeneous data and lack of correlation in traditional vulnerability repositories, this paper proposes a software vulnerability feature knowledge extraction method that combines the N‐gram model and mask similarity. The method generates mask text data based on the extraction of N‐gram candidate keywords and extracts vulnerability feature knowledge by calculating the similarity of mask text. This method analyzes the samples efficiently and stably in the environment of large sample size and complex samples and can obtain high‐value semi‐structured data. Then, the final node, relationship, and attribute information are obtained by secondary knowledge cleaning and extraction of the extracted semi‐structured data results. And based on the extraction results, the corresponding software vulnerability domain knowledge graph is constructed to deeply explore the semantic information features and entity relationships of vulnerabilities, which can help to efficiently study software security problems and solve vulnerability problems. The effectiveness and superiority of the proposed method is verified by comparing it with several traditional keyword extraction algorithms on Common Weakness Enumeration (CWE) and Common Vulnerabilities and Exposures (CVE) vulnerability data.

Funder

National Natural Science Foundation of China

Publisher

Wiley

Reference37 articles.

1. BaoL XiaX HassanAE YangX.V‐SZZ: automatic identification of version ranges affected by CVE vulnerabilities. In: Proceedings of the 44th International Conference on Software Engineering.IEEE;2022:2352‐2364.

2. CloostersT RodlerM DaviL.TeeRex: discovery and exploitation of memory corruption vulnerabilities in SGX enclaves. In: 29th USENIX Security Symposium USENIX Security 2020.USENIX Association;2020:841‐858.

3. Interpreting deep learning‐based vulnerability detector predictions based on heuristic searching;Zou D;ACM Trans Softw Eng Methodol,2021

4. ZhuangY LiuZ QianP LiuQ WangX HeQ.Smart contract vulnerability detection using graph neural network. In:Proceedings of the Twenty‐Ninth International Joint Conference on Artificial Intelligence IJCAI 2020.International Joint Conferences on Artificial Intelligence;2020:3283‐3290.

5. Vulnerability detection by learning from syntax‐based execution paths of code;Zhang J;IEEE Trans Softw Eng,2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3