Malicious URL Detection An Evaluation of Feature Extraction and Machine Learning Algorithm-Reference-Cited by-同舟云学术

Malicious URL Detection An Evaluation of Feature Extraction and Machine Learning Algorithm

Published:2022-12-03 Issue: Volume:23 Page:117-123
ISSN:2791-0210
Container-title:Highlights in Science, Engineering and Technology
language:
Short-container-title:HSET

Author:

Wang Yichen

Abstract

Cyber attacks are increasing rapidly today, and have a great influence on network security. Many of cyber attacks take place via malicious Uniform Resource Locators (URLs). As a result, various approaches have been developed to detect malicious URLs. One of the most competitive techniques is machine learning and deep learning. However, the detailed techniques concerning feature extraction for URLs and machine learning algorithm are still in the process of development. This paper aims to provide some references for screening out the methods of feature extraction and machine learning algorithm. In the designed experiment, the selected URLs are processed by two different methods of feature extraction, tokenization and vectorization, and lexical feature selection. The resultant constructs two different datasets (data1 and data2) for machine learning. Two traditional learning algorithms (Logistic Regression and SVM) and three ensemble learning algorithms (Random Forest, Gradient Boosting, and Bagging) are adopted as detection model for both datasets. The experimental results demonstrate that the method of tokenization and vectorization for feature extraction, together with ensemble learning algorithms can result in good predictive performance of malicious URL detection.

Publisher

Darcy & Roy Press Co. Ltd.

Reference24 articles.

1. Internet Security Threat Report (ISTR) 2019–Symantec. https://www. symantec.com/content/dam/symantec/docs/reports/istr-24-2019-en.pdf [Last accessed 10/2019].

2. M. Cova, C. Kruegel, and G. Vigna, “Detection and analysis of drive-by download attacks and malicious javascript code,” in Proceedings of the 19th international conference on World wide web, ACM, 2010, pp. 281–290.

3. M. Khonji, Y. Iraqi, and A. Jones, “Phishing detection: a literature survey,” IEEE Communications Surveys and Tutorials, 2013, vol. 15, no. 4, pp. 2091–2121.

4. R. Heartfield, and G. Loukas, “A taxonomy of attacks and a survey of defense mechanisms for semantic social engineering attacks,” ACM Computing Surveys (CSUR), 2015, vol. 48, no. 3, p. 37.

5. D. Sahoo, C. Liu, and S.C.H. Hoi, “Malicious URL detection using machine learning: a survey,” 1, 1 (August 2019), 37 pages, https://doi.org/10.1145/nnnnnnn.nnnnnnn, 2019.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Improving Cybersecurity: A Comparative Analysis of Machine Learning-Based Uniform Resource Locator (URL) Classification;2024 7th International Conference on Informatics and Computational Sciences (ICICoS);2024-07-17

2. Detection of Malicious URLs using Ensemble learning techniques;2023 IEEE Technology & Engineering Management Conference - Asia Pacific (TEMSCON-ASPAC);2023-12-14

3. Techniques for Creating the Image of a Child Character in the Stories of V. Astafyev’s Book “The Last Bow”;Tomsk state pedagogical university bulletin;2023-01-30