Abstract
In highly sophisticated network attacks, command-and-control (C&C) servers always use domain generation algorithms (DGAs) to dynamically produce several candidate domains instead of static hard-coded lists of IP addresses or domain names. Distinguishing the domains generated by DGAs from the legitimate ones is critical for finding out the existence of malware or further locating the hidden attackers. The word-based DGAs disclosed in recent network attack events have shown significantly stronger stealthiness when compared with traditional character-based DGAs. In word-based DGAs, two or more words are randomly chosen from one or more specific dictionaries to form a dynamic domain, these regularly generated domains aim to mimic the characteristics of a legitimate domain. Existing DGA detection schemes, including the state-of-the-art one based on deep learning, still cannot find out these domains accurately while maintaining an acceptable false alarm rate. In this study, we exploit the inter-word and inter-domain correlations using semantic analysis approaches, word embedding and the part-of-speech are taken into consideration. Next, we propose a detection framework for word-based DGAs by incorporating the frequency distribution of the words and that of part-of-speech into the design of the feature set. Using an ensemble classifier constructed from Naive Bayes, Extra-Trees, and Logistic Regression, we benchmark the proposed scheme with malicious and legitimate domain samples extracted from public datasets. The experimental results show that the proposed scheme can achieve significantly higher detection accuracy for word-based DGAs when compared with three state-of-the-art DGA detection schemes.
Funder
National Natural Science Foundation of China
Natural Science Foundation of Jiangsu Province
Subject
Physics and Astronomy (miscellaneous),General Mathematics,Chemistry (miscellaneous),Computer Science (miscellaneous)
Cited by
16 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A review on lexical based malicious domain name detection methods;Annals of Telecommunications;2024-06-13
2. A Novel Model Based on Ensemble Learning for Detecting DGA Botnets;2022 14th International Conference on Knowledge and Systems Engineering (KSE);2022-10-19
3. Malicious Domain Names Detection Algorithm Based on Statistical Features of URLs;2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD);2022-05-04
4. A semantic element representation model for malicious domain name detection;Journal of Information Security and Applications;2022-05
5. Optimal Covert Communication Techniques;International Journal of Informatics and Applied Mathematics;2022-04-11