Going beyond API Calls in Dynamic Malware Analysis: A Novel Dataset

Author:

Ilić Slaviša12ORCID,Gnjatović Milan2ORCID,Tot Ivan1,Jovanović Boriša1ORCID,Maček Nemanja3ORCID,Gavrilović Božović Marijana4ORCID

Affiliation:

1. Department of Military Electronic Engineering, University of Defence, Veljka Lukića Kurjaka 1, 11000 Belgrade, Serbia

2. Department of Information Technology, University of Criminal Investigation and Police Studies, Cara Dušana 196, 11080 Beograd, Serbia

3. School of Electrical and Computer Engineering, Academy of Technical and Art Applied Studies, Vojvode Stepe 283, 11000 Beograd, Serbia

4. Faculty of Engineering, University of Kragujevac, Sestre Janjić 6, 34000 Kragujevac, Serbia

Abstract

Automated sandbox-based analysis systems are dominantly focused on sequences of API calls, which are widely acknowledged as discriminative and easily extracted features. In this paper, we argue that an extension of the feature set beyond API calls may improve the malware detection performance. For this purpose, we apply the Cuckoo open-source sandbox system, carefully configured for the production of a novel dataset for dynamic malware analysis containing 22,200 annotated samples (11,735 benign and 10,465 malware). Each sample represents a full-featured report generated by the Cuckoo sandbox when a corresponding binary file is submitted for analysis. To support our position that the discriminative power of the full-featured sandbox reports is greater than the discriminative power of just API call sequences, we consider samples obtained from binary files whose execution induced API calls. In addition, we derive an additional dataset from samples in the full-featured dataset, whose samples contain only information on API calls. In a three-way factorial design experiment (considering the feature set, the feature representation technique, and the random forest model hyperparameter settings), we trained and tested a set of random forest models in a two-class classification task. The obtained results demonstrate that resorting to full-featured sandbox reports improves malware detection performance. The accuracy of 95.56 percent obtained for API call sequences was increased to 99.74 percent when full-featured sandbox reports were considered.

Publisher

MDPI AG

Reference33 articles.

1. Malware Analysis by Combining Multiple Detectors and Observation Windows;Ficco;IEEE Trans. Comput.,2022

2. Ho, T.K. (1995, January 14–16). Random decision forests. Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada.

3. Mira, F. (2019, January 1–3). A Review Paper of Malware Detection Using API Call Sequences. Proceedings of the 2019 2nd International Conference on Computer Applications & Information Security (ICCAIS), Riyadh, Saudi Arabia.

4. GRASE: Granulometry Analysis with Semi Eager Classifier to Detect Malware;Deore;Int. J. Interact. Multimed. Artif. Intell.,2024

5. Düzgün, B., Çayır, A., Demirkıran, F., Kahya, C.N., Gençaydın, B., and Dağ, H. (2022). Benchmark Static API Call Datasets for Malware Family Classification. arXiv.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3