Dynamic Malware Classification and API Categorisation of Windows Portable Executable Files Using Machine Learning-Reference-Cited by-同舟云学术

Dynamic Malware Classification and API Categorisation of Windows Portable Executable Files Using Machine Learning

Published:2024-01-25 Issue:3 Volume:14 Page:1015
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Syeda Durre Zehra¹^ORCID,Asghar Mamoona Naveed¹^ORCID

Affiliation:

1. School of Computer Science, University of Galway, H91 TK33 Galway, Ireland

Abstract

The rise of malware attacks presents a significant cyber-security challenge, with advanced techniques and offline command-and-control (C2) servers causing disruptions and financial losses. This paper proposes a methodology for dynamic malware analysis and classification using a malware Portable Executable (PE) file from the MalwareBazaar repository. It suggests effective strategies to mitigate the impact of evolving malware threats. For this purpose, a five-level approach for data management and experiments was utilised: (1) generation of a customised dataset by analysing a total of 582 malware and 438 goodware samples from Windows PE files; (2) feature extraction and feature scoring based on Chi2 and Gini importance; (3) empirical evaluation of six state-of-the-art baseline machine learning (ML) models, including Logistic Regression (LR), Support Vector Machine (SVM), Naive Bayes (NB), Random Forest (RF), XGBoost (XGB), and K-Nearest Neighbour (KNN), with the curated dataset; (4) malware family classification using VirusTotal APIs; and, finally, (5) categorisation of 23 distinct APIs from 266 malware APIs. According to the results, Gini’s method takes a holistic view of feature scoring, considering a wider range of API activities. The RF achieved the highest precision of 0.99, accuracy of 0.96, area under the curve (AUC) of 0.98, and F1-score of 0.96, with a 0.93 true-positive rate (TPR) and 0.0098 false-positive rate (FPR), among all applied ML models. The results show that Trojans (27%) and ransomware (22%) are the most risky among 11 malware families. Windows-based APIs (22%), the file system (12%), and registry manipulation (8.2%) showcased their importance in detecting malicious activity in API categorisation. This paper considers a dual approach for feature reduction and scoring, resulting in an improved F1-score (2%), and the inclusion of AUC and specificity metrics distinguishes it from existing research (Section Comparative Analysis with Existing Approaches). The newly generated dataset is publicly available in the GitHub repository (Data Availability Statement) to facilitate aspirant researchers’ dynamic malware analysis.

Funder

School of Computer Science, University of Galway, Ireland

Publisher

MDPI AG

Link

https://www.mdpi.com/2076-3417/14/3/1015/pdf

Reference53 articles.

1. Xhafa, F. (2022). Autonomous and Connected Heavy Vehicle Technology, Academic Press.

2. Basyurt, A.S., Fromm, J., Kuehn, P., Kaufhold, M.A., and Mirbabaie, M. (2022, January 21–23). Help Wanted—Challenges in Data Collection, Analysis and Communication of Cyber Threats in Security Operation Centers. Proceedings of the 17th International Conference on Wirtschaftsinformatik 2022, Nuremberg, Germany.

3. The rise of machine learning for detection and classification of malware: Research developments, trends and challenges;Gibert;J. Netw. Comput. Appl.,2020

4. (2023, June 12). Global Ransomware Damage Costs. Available online: https://cybersecurityventures.com/global-ransomware-damage-costs-predicted-to-reach-250-billion-usd-by-2031/.

5. (2023, June 12). A.T. ATLAS Malware & PUA. Available online: https://portal.av-atlas.org/malware/.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Going beyond API Calls in Dynamic Malware Analysis: A Novel Dataset;Electronics;2024-09-06

2. SINNER: A Reward-Sensitive Algorithm for Imbalanced Malware Classification Using Neural Networks with Experience Replay;Information;2024-07-23

3. CSMC: A Secure and Efficient Visualized Malware Classification Method Inspired by Compressed Sensing;Sensors;2024-06-30

4. Securing Edge Devices: Malware Classification with Dual-Attention Deep Network;Applied Sciences;2024-05-28