Abstract
Recognition of malware is critical in cybersecurity as it allows for avoiding execution and the downloading of malware. One of the possible approaches is to analyze the executable’s Application Programming Interface (API) calls, which can be done using tools that work in sandboxes, such as Cuckoo or CAPEv2. This chain of calls can then be used to classify if the considered file is benign or malware. This work aims to compare six modern shallow learning and deep learning techniques based on tabular data, using two datasets of API calls containing malware and goodware, where the corresponding chain of API calls is expressed for each instance. The results show the quality of shallow learning approaches based on tree ensembles, such as CatBoost, both in terms of F1-macro score and Area Under the ROC curve (AUC ROC), and training time, making them optimal for making inferences on Edge AI solutions. The results are then analyzed with the explainable AI SHAP technique, identifying the API calls that most influence the process, i.e., those that are particularly afferent to malware and goodware.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference31 articles.
1. Cape Sandboxhttps://capev2.readthedocs.io/en/latest/introduction/what.html
2. Data augmentation based malware detection using convolutional neural networks
3. Behavioral malware detection using deep graph convolutional neural networks;Oliveira;Int. J. Comp. Appl.,2021
4. Malware Analysis Datasets: API Call Sequenceshttps://ieee-dataport.org/open-access/malware-analysis-datasets-api-call-sequences
5. A Novel Approach to Detect Malware Based on API Call Sequence Analysis
Cited by
13 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献