Double-Layer Detection Model of Malicious PDF Documents Based on Entropy Method with Multiple Features-Reference-Cited by-同舟云学术

Double-Layer Detection Model of Malicious PDF Documents Based on Entropy Method with Multiple Features

Published:2023-07-23 Issue:7 Volume:25 Page:1099
ISSN:1099-4300
Container-title:Entropy
language:en
Short-container-title:Entropy

Author:

Song Enzhou¹,Hu Tao¹,Yi Peng¹,Wang Wenbo¹

Affiliation:

1. Information Technology Institute, Information Engineering University, Zhengzhou 450001, China

Abstract

Traditional PDF document detection technology usually builds a rule or feature library for specific vulnerabilities and therefore is only fit for single detection targets and lacks anti-detection ability. To address these shortcomings, we build a double-layer detection model for malicious PDF documents based on an entropy method with multiple features. First, we address the single detection target problem with the fusion of 222 multiple features, including 130 basic features (such as objects, structure, content stream, metadata, etc.) and 82 dangerous features (such as suspicious and encoding function, etc.), which can effectively resist obfuscation and encryption. Second, we generate the best set of features (a total of 153) by creatively applying an entropy method based on RReliefF and MIC (EMBORAM) to PDF samples with 37 typical document vulnerabilities, which can effectively resist anti-detection methods, such as filling data and imitation attacks. Finally, we build a double-layer processing framework to detect samples efficiently through the AdaBoost-optimized random forest algorithm and the robustness-optimized support vector machine algorithm. Compared to the traditional static detection method, this model performs better for various evaluation criteria. The average time of document detection is 1.3 ms, while the accuracy rate reaches 95.9%.

Funder

National Natural Science Foundation of China

Publisher

MDPI AG

Subject

General Physics and Astronomy

Link

https://www.mdpi.com/1099-4300/25/7/1099/pdf

Reference38 articles.

1. A Survey of Research on Malicious Document Detection;Yu;J. Cyber Secur.,2021

2. Nissim, N., Cohen, A., Moskovitch, R., Shabtai, A., Edry, M., Bar-Ad, O., and Elovici, Y. (2014, January 24–26). ALPD: Active Learning Framework for Enhancing the Detection of Malicious PDF Files. Proceedings of the 2014 IEEE Joint Intelligence and Security Informatics Conference, The Hague, The Netherlands.

3. Wang, Y. (2021, January 23–25). The De-Obfuscation Method in the Static Detection of Malicious PDF Documents. Proceedings of the 7th Annual International Conference on Network and Information Systems for Computers, Guiyang, China.

4. PDF Document Detection Model Based on System Calls and Data Provenance;Lei;J. Comput. Appl.,2022

5. Lu, X., Wang, F., Jiang, C., and Lio, P. (2021). A Universal Malicious Documents Static Detection Framework Based on Feature Generalization. Appl. Sci., 11.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Malicious Documents Detection and Classification Using Hybrid Boosted-Support Vector Machine with High Dimensional Features;2024 Third International Conference on Distributed Computing and Electrical Circuits and Electronics (ICDCECE);2024-04-26