BERT-Based Approaches to Identifying Malicious URLs-Reference-Cited by-同舟云学术

BERT-Based Approaches to Identifying Malicious URLs

Published:2023-10-16 Issue:20 Volume:23 Page:8499
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Su Ming-Yang¹,Su Kuan-Lin¹

Affiliation:

1. Department of Computer Science and Information Engineering, Ming Chuan University, Taoyuan City 333, Taiwan

Abstract

Malicious uniform resource locators (URLs) are prevalent in cyberattacks, particularly in phishing attempts aimed at stealing sensitive information or distributing malware. Therefore, it is of paramount importance to accurately detect malicious URLs. Prior research has explored the use of deep-learning models to identify malicious URLs, using the segmentation of URL strings into character-level or word-level tokens, and embedding and employing trained models to differentiate between URLs. In this study, a bidirectional encoder representation from a transformers-based (BERT) model was devised to tokenize URL strings, employing its self-attention mechanism to enhance the understanding of correlations among tokens. Subsequently, a classifier was employed to determine whether a given URL was malicious. In evaluating the proposed methods, three different types of public datasets were utilized: a dataset consisting solely of URL strings from Kaggle, a dataset containing only URL features from GitHub, and a dataset including both types of data from the University of New Brunswick, namely, ISCX 2016. The proposed system achieved accuracy rates of 98.78%, 96.71%, and 99.98% on the three datasets, respectively. Additionally, experiments were conducted on two datasets from different domains—the Internet of Things (IoT) and Domain Name System over HTTPS (DoH)—to demonstrate the versatility of the proposed model.

Funder

Ministry of Science and Technology (MOST) Taiwan

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/23/20/8499/pdf

Reference33 articles.

1. Aaron, G., Chapin, L., Piscitello, D., and Strutt, C. (2022). Phishing Landscape 2022: An Annual Study of the Scope and Distribution of Phishing, Interisle Consulting Group, LLC. Available online: https://interisle.net/PhishingLandscape2022.pdf.

2. (2023, September 23). Trend Micro 2021 Annual Cybersecurity Report: Navigating New Frontiers, 17 March 2022; pp. 1–42. Available online: https://documents.trendmicro.com/assets/rpt/rpt-navigating-new-frontiers-trend-micro-2021-annual-cybersecurity-report.pdf.

3. Kumar, R., Zhang, X., Tariq, H.A., and Khan, R.U. (2017, January 15–17). Malicious URL Detection Using Multi-Layer Filtering Model. Proceedings of the 14th IEEE International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), Chengdu, China.

4. Phishing URL detection using machine learning methods;Ahammad;Adv. Eng. Softw.,2022

5. A novel approach for phishing URLs detection using lexical based machine learning in a real-time environment;Gupta;Comput. Commun.,2021

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A comprehensive literature review on phishing URL detection using deep learning techniques;Journal of Cyber Security Technology;2024-07-23

2. Context and Multi-Features-Based Vulnerability Detection: A Vulnerability Detection Frame Based on Context Slicing and Multi-Features;Sensors;2024-02-20

3. Malicious Website Detection Method Based on BGResNet Multi-feature Fusion;2023 IEEE 5th Eurasia Conference on IOT, Communication and Engineering (ECICE);2023-10-27