Towards enhanced PDF maldocs detection with feature engineering: design challenges-Reference-Cited by-同舟云学术

Towards enhanced PDF maldocs detection with feature engineering: design challenges

Published:2022-05-17 Issue:28 Volume:81 Page:41103-41130
ISSN:1380-7501
Container-title:Multimedia Tools and Applications
language:en
Short-container-title:Multimed Tools Appl

Author:

Falah Ahmed^ORCID,Pokhrel Shiva Raj,Pan Lei,de Souza-Daw Anthony

Abstract

AbstractIn this paper, we perform an in-depth analysis of a large corpus of PDF maldocs to identify the key set of significantly important features and help in maldoc detection. Existing industry-based tools for the detection are inefficient and cannot prevent PDF maldocs because they are generic and depend primarily on a signature-based approach. Besides, several other methods developed by academics suffer heavily from reduced effectiveness. The feature-set using machine learning classifiers is prone to various known attacks, such as mimicry and parser confusion. Also, we discover that increasingly more malicious files i) contain evasive and obfuscated JavaScript code, ii) include hidden contents (mostly outside the objects), iii) have a corrupted document structure, and iv) usually contain short JavaScript code blocks. We utilise maldoc attacks’ evolution over a decade to highlight the essential features (e.g., concept drifts) that impact detectors and classifiers.

Funder

Deakin University

Publisher

Springer Science and Business Media LLC

Subject

Computer Networks and Communications,Hardware and Architecture,Media Technology,Software

Link

https://link.springer.com/content/pdf/10.1007/s11042-022-11960-x.pdf

Reference30 articles.

1. Adobe Systems Incorporated (2007) JavascriptTM for acrobat®; api reference, available online at https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/js_api_reference.pdf. Accessed Sep 2019

2. Carlin D, O’Kane P, Sezer S (2019) A cost analysis of machine learning using dynamic runtime opcodes for malware detection. Comput Secur 85:138–155

3. Carmony C, Hu X, Yin H, Bhaskar AV, Zhang M (2016) Extract Me If You Can: Abusing PDF Parsers in Malware Detectors. In: NDSS, pp 1–15

4. Dang H, Huang Y, Chang E-C (2017) Evading Classifiers by Morphing in the Dark. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. Dallas, Texas, USA: ACM, pp 119–133

5. Ding Y, Wu R, Zhang X (2019) Ontology-based knowledge representation for malware individuals and families, Computers & Security, p 101574

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An Enhanced Feature-Based Hybrid Approach for Adversarial PDF Malware Detection;2024 6th International Conference on Electrical Engineering and Information & Communication Technology (ICEEICT);2024-05-02

2. PDF Malware Detection Based on Fuzzy Unordered Rule Induction Algorithm (FURIA);Applied Sciences;2023-03-21