Abstract
AbstractIn this paper, we perform an in-depth analysis of a large corpus of PDF maldocs to identify the key set of significantly important features and help in maldoc detection. Existing industry-based tools for the detection are inefficient and cannot prevent PDF maldocs because they are generic and depend primarily on a signature-based approach. Besides, several other methods developed by academics suffer heavily from reduced effectiveness. The feature-set using machine learning classifiers is prone to various known attacks, such as mimicry and parser confusion. Also, we discover that increasingly more malicious files i) contain evasive and obfuscated JavaScript code, ii) include hidden contents (mostly outside the objects), iii) have a corrupted document structure, and iv) usually contain short JavaScript code blocks. We utilise maldoc attacks’ evolution over a decade to highlight the essential features (e.g., concept drifts) that impact detectors and classifiers.
Publisher
Springer Science and Business Media LLC
Subject
Computer Networks and Communications,Hardware and Architecture,Media Technology,Software
Reference30 articles.
1. Adobe Systems Incorporated (2007) JavascriptTM for acrobat®; api reference, available online at https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/js_api_reference.pdf. Accessed Sep 2019
2. Carlin D, O’Kane P, Sezer S (2019) A cost analysis of machine learning using dynamic runtime opcodes for malware detection. Comput Secur 85:138–155
3. Carmony C, Hu X, Yin H, Bhaskar AV, Zhang M (2016) Extract Me If You Can: Abusing PDF Parsers in Malware Detectors. In: NDSS, pp 1–15
4. Dang H, Huang Y, Chang E-C (2017) Evading Classifiers by Morphing in the Dark. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. Dallas, Texas, USA: ACM, pp 119–133
5. Ding Y, Wu R, Zhang X (2019) Ontology-based knowledge representation for malware individuals and families, Computers & Security, p 101574
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. An Enhanced Feature-Based Hybrid Approach for Adversarial PDF Malware Detection;2024 6th International Conference on Electrical Engineering and Information & Communication Technology (ICEEICT);2024-05-02
2. PDF Malware Detection Based on Fuzzy Unordered Rule Induction Algorithm (FURIA);Applied Sciences;2023-03-21