Abstract
Cyber attacks are increasing rapidly today, and have a great influence on network security. Many of cyber attacks take place via malicious Uniform Resource Locators (URLs). As a result, various approaches have been developed to detect malicious URLs. One of the most competitive techniques is machine learning and deep learning. However, the detailed techniques concerning feature extraction for URLs and machine learning algorithm are still in the process of development. This paper aims to provide some references for screening out the methods of feature extraction and machine learning algorithm. In the designed experiment, the selected URLs are processed by two different methods of feature extraction, tokenization and vectorization, and lexical feature selection. The resultant constructs two different datasets (data1 and data2) for machine learning. Two traditional learning algorithms (Logistic Regression and SVM) and three ensemble learning algorithms (Random Forest, Gradient Boosting, and Bagging) are adopted as detection model for both datasets. The experimental results demonstrate that the method of tokenization and vectorization for feature extraction, together with ensemble learning algorithms can result in good predictive performance of malicious URL detection.
Publisher
Darcy & Roy Press Co. Ltd.
Reference24 articles.
1. Internet Security Threat Report (ISTR) 2019–Symantec. https://www. symantec.com/content/dam/symantec/docs/reports/istr-24-2019-en.pdf [Last accessed 10/2019].
2. M. Cova, C. Kruegel, and G. Vigna, “Detection and analysis of drive-by download attacks and malicious javascript code,” in Proceedings of the 19th international conference on World wide web, ACM, 2010, pp. 281–290.
3. M. Khonji, Y. Iraqi, and A. Jones, “Phishing detection: a literature survey,” IEEE Communications Surveys and Tutorials, 2013, vol. 15, no. 4, pp. 2091–2121.
4. R. Heartfield, and G. Loukas, “A taxonomy of attacks and a survey of defense mechanisms for semantic social engineering attacks,” ACM Computing Surveys (CSUR), 2015, vol. 48, no. 3, p. 37.
5. D. Sahoo, C. Liu, and S.C.H. Hoi, “Malicious URL detection using machine learning: a survey,” 1, 1 (August 2019), 37 pages, https://doi.org/10.1145/nnnnnnn.nnnnnnn, 2019.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献