Abstract
With the widespread adoption of smartphones, Android has emerged as a preferred and highly targeted platform by malware. The proliferation of malware for Android devices has been exponential and to counter this Android malware detection together with familial classification has to be automated. This paper introduces a dual-pronged approach for Android malware detection and familial classification. The proposed approach employs a static analysis approach to extract Java ARchive (JAR) files from Android application packages (APKs). Our methodology involves utilizing extensive hex strings derived from JAR files and applying n-gram sliding window technique to extract features. To validate the robustness of our model and assess its versatility, we employed both standard and obfuscated malware datasets. A range of machine learning models, including Naive Bayes(NB), Random Forest(RF), Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Decision Tree (DT) and a Convolutional Neural Network (CNN) for familial classification, were employed. The experiments encompassed non-obfuscated malware samples (5560), obfuscated malware samples (15479), and benign samples (6200). Additionally, we conducted a comparative analysis of our model's performance against existing methods, including those based on deep learning.