Affiliation:
1. School of Computing Science, Chengdu University of Information Technology, Chengdu 610225, China
2. School of Computer Science, University of Nottingham Jubilee Campus, Nottingham NG8 1BB, UK
Abstract
The development of Internet and network applications has brought the development of encrypted communication technology. But on this basis, malicious traffic also uses encryption to avoid traditional security protection and detection. Traditional security protection and detection methods cannot accurately detect encrypted malicious traffic. In recent years, the rise of artificial intelligence allows us to use machine learning and deep learning methods to detect encrypted malicious traffic without decryption, and the detection results are very accurate. At present, the research on malicious encrypted traffic detection mainly focuses on the characteristics’ analysis of encrypted traffic and the selection of machine learning algorithms. In this paper, a method combining natural language processing and machine learning is proposed; that is, a detection method based on TF-IDF is proposed to build a detection model. In the process of data preprocessing, this method introduces the natural language processing method, namely, the TF-IDF model, to extract data information, obtain the importance of keywords, and then reconstruct the characteristics of data. The detection method based on the TF-IDF model does not need to analyze each field of the data set. Compared with the general machine learning data preprocessing method, that is, data encoding processing, the experimental results show that using natural language processing technology to preprocess data can effectively improve the accuracy of detection. Gradient boosting classifier, random forest classifier, AdaBoost classifier, and the ensemble model based on these three classifiers are, respectively, used in the construction of the later models. At the same time, CNN neural network in deep learning is also used for training, and CNN can effectively extract data information. Under the condition that the input data of the classifier and neural network are consistent, through the comparison and analysis of various methods, the accuracy of the one-dimensional convolutional network based on CNN is slightly higher than that of the classifier based on machine learning.
Funder
Sichuan Science and Technology Program
Subject
Computer Networks and Communications,Information Systems
Reference26 articles.
1. Study data from Wuhan University of Technology update understanding of supercomputing (Ths-idpc: a three-stage hierarchical sampling method based on improved density peaks clustering algorithm for encrypted malicious traffic detection);Computing Supercomputing;Mathematics Week,2020
2. Stand for investigation of the characteristics of screw downloaders
3. Detection of encrypted multimedia traffic through extraction and parameterization of recurrence plots. Science and engineering research center;C. Michele
4. DISTILLER: Encrypted traffic classification via multimodal multitask deep learning
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献