Abstract
Botnets often apply domain name generation algorithms (DGAs) to evade detection by generating large numbers of pseudo-random domain names of which only few are registered by cybercriminals. In this paper, we address how DGA-generated domain names can be detected by means of machine learning and deep learning. We first present an extensive literature review on recent prior work in which machine learning and deep learning have been applied for detecting DGA-generated domain names. We observe that a common methodology is still missing, and the use of different datasets causes that experimental results can hardly be compared. We next propose the use of TF-IDF to measure frequencies of the most relevant n-grams in domain names, and use these as features in learning algorithms. We perform experiments with various machine-learning and deep-learning models using TF-IDF features, of which a deep MLP model yields the best results. For comparison, we also apply an LSTM model with embedding layer to convert domain names from a sequence of characters into a vector representation. The performance of our LSTM and MLP models is rather similar, achieving 0.994 and 0.995 AUC, and average F1-scores of 0.907 and 0.891 respectively.
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Cited by
14 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Comparing Deep Neural Networks and Machine Learning for Detecting Malicious Domain Name Registrations;2024 IEEE International Conference on Omni-layer Intelligent Systems (COINS);2024-07-29
2. A review on lexical based malicious domain name detection methods;Annals of Telecommunications;2024-06-13
3. Evading deep learning-based DGA detectors: current problems and solutions.;Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications VI;2024-06-07
4. Word encoding for word-looking DGA-based Botnet classification;2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC);2023-10-31
5. Use of subword tokenization for domain generation algorithm classification;Cybersecurity;2023-09-07