An ensemble approach for imbalanced multiclass malware classification using 1D-CNN-Reference-Cited by-同舟云学术

An ensemble approach for imbalanced multiclass malware classification using 1D-CNN

Published:2023-11-14 Issue: Volume:9 Page:e1677
ISSN:2376-5992
Container-title:PeerJ Computer Science
language:en
Short-container-title:

Author:

Panda Binayak¹,Bisoyi Sudhanshu Shekhar²,Panigrahy Sidhanta³

Affiliation:

1. Department of Computer Science and Engineering, Institute of Technical Education and Research, Siksha ’O’ Anusandhan (Deemed to be) University, Bhubaneswar, Odisha, India

2. Department of Computer Science and Information Technology, Institute of Technical Education and Research, Siksha ‘O’ Anusandhan (Deemed to be) University, Bhubaneswar, Odisha, India

3. Haas School of Business, University of California, Berkeley, Berkeley, CA, United States of America

Abstract

Dependence on the internet and computer programs demonstrates the significance of computer programs in our day-to-day lives. Such demands motivate malware developers to create more malware, both in terms of quantity and variety. Researchers are constantly faced with hurdles while attempting to protect themselves from potential hazards and risks due to malware authors’ usage of code obfuscation techniques. Metamorphic and polymorphic variations are easily able to elude the widely utilized signature-based detection procedures. Researchers are more interested in deep learning approaches than machine learning techniques to analyze the behavior of such a vast number of virus variants. Researchers have been drawn to the categorization of malware within itself in addition to the classification of malware against benign programs to examine the behavioral differences between them. In order to investigate the relationship between the application programming interface (API) calls throughout API sequences and classify them, this work uses the one-dimensional convolutional neural network (1D-CNN) model to solve a multiclass classification problem. On API sequences, feature vectors for distinctive APIs are created using the Word2Vec word embedding approach and the skip-gram model. The one-vs.-rest approach is used to train 1D-CNN models to categorize malware, and all of them are then combined with a suggested ModifiedSoftVoting algorithm to improve classification. On the open benchmark dataset Mal-API-2019, the suggested ensembled 1D-CNN architecture captures improved evaluation scores with an accuracy of 0.90, a weighted average F1-score of 0.90, and an AUC score of more than 0.96 for all classes of malware.

Publisher

PeerJ

Subject

General Computer Science

Link

https://peerj.com/articles/cs-1677.pdf

Reference26 articles.

1. A state-of-the-art survey on deep learning theory and architectures;Alom;Electronics,2019

2. A comprehensive review on malware detection approaches;Aslan;IEEE Access,2020

3. Data augmentation based malware detection using convolutional neural networks;Catak;PeerJ Computer Science,2021

4. Deep learning based sequential model for malware analysis using Windows exe API Calls;Catak;PeerJ Computer Science,2020

5. An ensemble of pre-trained transformer models for imbalanced multiclass malware classification;Demirkiran;Computers and Security,2022