Use of subword tokenization for domain generation algorithm classification-Reference-Cited by-同舟云学术

Use of subword tokenization for domain generation algorithm classification

Published:2023-09-07 Issue:1 Volume:6 Page:
ISSN:2523-3246
Container-title:Cybersecurity
language:en
Short-container-title:Cybersecurity

Author:

Liew Sea Ran Cleon,Law Ngai Fong^ORCID

Abstract

AbstractDomain name generation algorithm (DGA) classification is an essential but challenging problem. Both feature-extracting machine learning (ML) methods and deep learning (DL) models such as convolutional neural networks and long short-term memory have been developed. However, the performance of these approaches varies with different types of DGAs. Most features in the ML methods can characterize random-looking DGAs better than word-looking DGAs. To improve the classification performance on word-looking DGAs, subword tokenization is employed for the DL models. Our experimental results proved that the subword tokenization can provide excellent classification performance on the word-looking DGAs. We then propose an integrated scheme that chooses an appropriate method for DGA classification depending on the nature of the DGAs. Results show that the integrated scheme outperformed existing ML and DL methods, and also the subword DL methods.

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Computer Networks and Communications,Information Systems,Software

Link

https://link.springer.com/content/pdf/10.1186/s42400-023-00183-8.pdf

Reference27 articles.

1. Almashhadani AO, Kaiiali M, Carlin D, Sezer S (2020) MaldomDetector: a system for detecting algorithmically generated domain names with machine learning. Comput Secur 93:101787

2. Antonakakis M, Perdisci R, Nadji Y, Vasiloglou N, Abu-Nimeh S, Lee W, Dagon D (2012) From throw-away traffic to bots: detecting the rise of DGA-based malware. In: USENIX security symposium, p 24

3. Berman DS (2019) DGA CapsNet: 1D application of capsule networks to DGA detection. Information 10:157

4. Bilge L, Şen S, Balzarotti D, Kirda E, Krügel C (2014) Exposure: a passive DNS analysis service to detect and report malicious domains. ACM Trans Inf Syst Secur 16:14

5. Cucchiarelli A, Morbidoni C, Spalazzi L, Baldi M (2021) Algorithmically generated malicious domain names detection based on n-grams features. Expert Syst Appl 170:114551

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Word encoding for word-looking DGA-based Botnet classification;2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC);2023-10-31