Author:
Al-Fayoumi Mustafa,Abu Al-Haija Qasem,Armoush Rakan,Amareen Christine
Abstract
With the increasing number of malicious PDF files used for cyberattacks, it is essential to develop efficient and accurate classifiers to detect and prevent these threats. Machine Learning (ML) models have successfully detected malicious PDF files. This paper presents XAI-PDF, an efficient system for malicious PDF detection designed to enhance accuracy and minimize decision-making time on a modern dataset, the Evasive-PDFMal2022 dataset. The proposed method optimizes malicious PDF classifier performance by employing feature engineering guided by Shapley Additive Explanations (SHAP). Particularly, the model development approach comprises four phases: data preparation, model building, explainability of the models, and derived features. Utilizing the interpretability of SHAP values, crucial features are identified, and new ones are generated, resulting in an improved classification model that showcases the effectiveness of interpretable AI techniques in enhancing model performance. Various interpretable ML models were implemented, with the Lightweight Gradient Boosting Machine (LGBM) outperforming other classifiers. The Explainable Artificial Intelligence (XAI) global surrogate model generated explanations for LGBM predictions. Experimental comparisons of XAI-PDF with baseline methods revealed its superiority in achieving higher accuracy, precision, and F1-scores with minimal False Positive (FP) and False Negative (FN) rates (99.9%, 100%, 99.89%,0.000, and 0.002, respectively). Additionally, XAI-PDF requires only 1.36 milliseconds per record for predictions, demonstrating increased resilience in detecting evasive malicious PDF files compared to state-of-the-art methods
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献