Affiliation:
1. Department of Computer Science and Statistics (DCCE), São Paulo State University (UNESP), São José do Rio Preto, São Paulo 15054-000, Brazil
Abstract
Domain Generation Algorithms (DGAs) are algorithms present in most malware used by botnets and advanced persistent threats. These algorithms dynamically generate domain names to maintain and obfuscate communication between the infected device and the attacker’s command and control server. Since DGAs are used by many threats, it is extremely important to classify a given DGA according to the threat it is related to. In addition, as new threats emerge daily, classifier models tend to become obsolete over time. Deep neural networks tend to lose their classification ability when retrained with a dataset that is significantly different from the initial one, a phenomenon known as catastrophic forgetting. This work presents a computational scheme composed of a deep learning model based on CNN and natural language processing and an incremental learning technique for class increment through transfer learning to classify 60 DGA families and include a new family to the classifier model, training the model incrementally using some examples from known families, avoiding catastrophic forgetting and maintaining metric levels. The proposed methodology achieved an average precision of 86.75%, an average recall of 83.06%, and an average F1 score of 83.78% with the full dataset, and suffered minimal losses when applying the class increment.
Funder
National Council for Scientific and Technological Development CNPq
NIC.BR—Núcleo de Informação e Coordenação do Ponto BR
Reference36 articles.
1. Kambourakis, G., Anagnostopoulos, M., Meng, W., and Zhou, P. (2019). Botnets: Architectures, Countermeasures, and Challenges, CRC Press. [1st ed.].
2. Shahzad, H., Sattar, A., and Skandaraniyam, J. (2021, January 8–10). DGA Domain Detection using Deep Learning. Proceedings of the 2021 IEEE 5th International Conference on Cryptography, Security and Privacy (CSP), Zhuhai, China.
3. Wong, A.D. (2023). Detecting Domain-Generation Algorithm (DGA) Based Fully-Qualified Domain Names (FQDNs) with Shannon Entropy. arXiv.
4. Huang, W., Zong, Y., Shi, Z., Wang, L., and Liu, P. (2022, January 18–23). PEPC: A Deep Parallel Convolutional Neural Network Model with Pre-trained Embeddings for DGA Detection. Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy.
5. A DGA domain names detection modeling method based on integrating an attention mechanism and deep neural network;Ren;Cybersecurity,2020