A Machine Learning Based Three-Step Framework for Malicious URL Detection-Reference-Cited by-同舟云学术

A Machine Learning Based Three-Step Framework for Malicious URL Detection

Published:2023-08-28 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Chen Qisheng¹^ORCID,Omote Kazumasa²

Affiliation:

1. University of Tsukuba

2. University of Tsukuba Graduate School Systems and Information Engineering: Tsukuba Daigaku Daigakuin System Joho Kogaku Kenkyuka

Abstract

Abstract In order to solve the shortcomings of using blacklist method to detect malicious URLs, such as slow update speed, the research of using machine learning to detect malicious URLs increasing. These research have proposed their own methods and obtained great accuracy, but the summary research on malicious URLs detection is insufficient. In this paper, we propose a three-step framework for malicious URLs detection, and we overview 14 related works by our three-step framework and find that almost all research on malicious URLs detection using machine learning can be classified by the three-step framework. We evaluate some machine learning models and context-considering methods and their suitability by our three-step framework. According to the results, we verify the importance of considering context and find that context-considering embedding methods are more important and the malicious URLs detection accuracy improved with context-considering methods.

Publisher

Research Square Platform LLC

Reference39 articles.

1. Hung Le and Quang Pham and Doyen Sahoo and Steven C. H. Hoi (2018) URLNet: Learning a {URL} Representation with Deep Learning for Malicious {URL} Detection. CoRR abs/1802.03162dblp computer science bibliography, https://dblp.org, https://dblp.org/rec/journals/corr/abs-1802-03162.bib, Mon, 13 Aug 2018 16:46:16 +0200, 1802.03162, arXiv, http://arxiv.org/abs/1802.03162

2. Kaneko, Satomi and Yamada, Akira and Sawaya, Yukiko and Thao, Tran Phuong and Kubota, Ayumu and Omote, Kazumasa (2020) Detecting Malicious Websites by Query Templates. Springer International Publishing, 978-3-030-41025-4, Innovative Security Solutions for Information Technology and Communications, Simion, Emil and G{\'e}raud-Stewart, R{\'e}mi

3. H. {Yuan} and Z. {Yang} and X. {Chen} and Y. {Li} and W. {Liu} (2018) URL2Vec: URL Modeling with Character Embeddings for Fast and Accurate Phishing Website Detection. 10.1109/BDCloud.2018.00050, 265-272, , , 2018 IEEE Intl Conf on Parallel Distributed Processing with Applications, Ubiquitous Computing Communications, Big Data Cloud Computing, Social Computing Networking, Sustainable Computing Communications

4. Yoav Goldberg and Omer Levy (2014) word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method. CoRR abs/1402.3722dblp computer science bibliography, https://dblp.org, https://dblp.org/rec/journals/corr/GoldbergL14.bib, Mon, 13 Aug 2018 16:47:34 +0200, 1402.3722, arXiv, http://arxiv.org/abs/1402.3722

5. Xin Rong (2014) word2vec Parameter Learning Explained. CoRR abs/1411.2738dblp computer science bibliography, https://dblp.org, https://dblp.org/rec/journals/corr/Rong14.bib, Mon, 13 Aug 2018 16:45:57 +0200, 1411.2738, arXiv, http://arxiv.org/abs/1411.2738