A Machine Learning Based Three-Step Framework for Malicious URL Detection

Author:

Chen Qisheng1ORCID,Omote Kazumasa2

Affiliation:

1. University of Tsukuba

2. University of Tsukuba Graduate School Systems and Information Engineering: Tsukuba Daigaku Daigakuin System Joho Kogaku Kenkyuka

Abstract

Abstract In order to solve the shortcomings of using blacklist method to detect malicious URLs, such as slow update speed, the research of using machine learning to detect malicious URLs increasing. These research have proposed their own methods and obtained great accuracy, but the summary research on malicious URLs detection is insufficient. In this paper, we propose a three-step framework for malicious URLs detection, and we overview 14 related works by our three-step framework and find that almost all research on malicious URLs detection using machine learning can be classified by the three-step framework. We evaluate some machine learning models and context-considering methods and their suitability by our three-step framework. According to the results, we verify the importance of considering context and find that context-considering embedding methods are more important and the malicious URLs detection accuracy improved with context-considering methods.

Publisher

Research Square Platform LLC

Reference39 articles.

1. Hung Le and Quang Pham and Doyen Sahoo and Steven C. H. Hoi (2018) URLNet: Learning a {URL} Representation with Deep Learning for Malicious {URL} Detection. CoRR abs/1802.03162dblp computer science bibliography, https://dblp.org, https://dblp.org/rec/journals/corr/abs-1802-03162.bib, Mon, 13 Aug 2018 16:46:16 +0200, 1802.03162, arXiv, http://arxiv.org/abs/1802.03162

2. Kaneko, Satomi and Yamada, Akira and Sawaya, Yukiko and Thao, Tran Phuong and Kubota, Ayumu and Omote, Kazumasa (2020) Detecting Malicious Websites by Query Templates. Springer International Publishing, 978-3-030-41025-4, Innovative Security Solutions for Information Technology and Communications, Simion, Emil and G{\'e}raud-Stewart, R{\'e}mi

3. H. {Yuan} and Z. {Yang} and X. {Chen} and Y. {Li} and W. {Liu} (2018) URL2Vec: URL Modeling with Character Embeddings for Fast and Accurate Phishing Website Detection. 10.1109/BDCloud.2018.00050, 265-272, , , 2018 IEEE Intl Conf on Parallel Distributed Processing with Applications, Ubiquitous Computing Communications, Big Data Cloud Computing, Social Computing Networking, Sustainable Computing Communications

4. Yoav Goldberg and Omer Levy (2014) word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method. CoRR abs/1402.3722dblp computer science bibliography, https://dblp.org, https://dblp.org/rec/journals/corr/GoldbergL14.bib, Mon, 13 Aug 2018 16:47:34 +0200, 1402.3722, arXiv, http://arxiv.org/abs/1402.3722

5. Xin Rong (2014) word2vec Parameter Learning Explained. CoRR abs/1411.2738dblp computer science bibliography, https://dblp.org, https://dblp.org/rec/journals/corr/Rong14.bib, Mon, 13 Aug 2018 16:45:57 +0200, 1411.2738, arXiv, http://arxiv.org/abs/1411.2738

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3