Phishing Webpage Detection via Multi-Modal Integration of HTML DOM Graphs and URL Features Based on Graph Convolutional and Transformer Networks-Reference-Cited by-同舟云学术

Phishing Webpage Detection via Multi-Modal Integration of HTML DOM Graphs and URL Features Based on Graph Convolutional and Transformer Networks

Published:2024-08-22 Issue:16 Volume:13 Page:3344
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Yoon Jun-Ho¹,Buu Seok-Jun¹^ORCID,Kim Hae-Jung²

Affiliation:

1. Department of Computer Engineering, Gyeongsang National University, Jinju-si 52828, Republic of Korea

2. Department of Computer Engineering, Kyungil University, Gyeongsan-si 38428, Republic of Korea

Abstract

Detecting phishing webpages is a critical task in the field of cybersecurity, with significant implications for online safety and data protection. Traditional methods have primarily relied on analyzing URL features, which can be limited in capturing the full context of phishing attacks. In this study, we propose an innovative approach that integrates HTML DOM graph modeling with URL feature analysis using advanced deep learning techniques. The proposed method leverages Graph Convolutional Networks (GCNs) to model the structure of HTML DOM graphs, combined with Convolutional Neural Networks (CNNs) and Transformer Networks to capture the character and word sequence features of URLs, respectively. These multi-modal features are then integrated using a Transformer network, which is adept at selectively capturing the interdependencies and complementary relationships between different feature sets. We evaluated our approach on a real-world dataset comprising URL and HTML DOM graph data collected from 2012 to 2024. This dataset includes over 80 million nodes and edges, providing a robust foundation for testing. Our method demonstrated a significant improvement in performance, achieving a 7.03 percentage point increase in classification accuracy compared to state-of-the-art techniques. Additionally, we conducted ablation tests to further validate the effectiveness of individual features in our model. The results validate the efficacy of integrating HTML DOM structure and URL features using deep learning. Our framework significantly enhances phishing detection capabilities, providing a more accurate and comprehensive solution to identifying malicious webpages.

Funder

Korea government

Publisher

MDPI AG

Link

https://www.mdpi.com/2079-9292/13/16/3344/pdf

Reference36 articles.

1. Dhamija, R., Tygar, J.D., and Hearst, M. (2006, January 22–27). Why phishing works. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Montreal, QC, Canada.

2. Lee, J., Wang, J., de Guzman, M.C., Gupta, M., and Rao, H.R. (2024). Can I Help Prevent Data Breaches in the Workplace? From Routine Activities to Extra-Role Security Behaviors. IEEE Trans. Technol. Soc., Early Access.

3. Tsai, Y.-D., Liow, C., Siang, Y.S., and Lin, S.-D. (2024, January 20–27). Toward more generalized malicious url detection models. Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada.

4. Why phishing still works: User strategies for combating phishing attacks;Alsharnouby;Int. J. Hum.-Comput. Stud.,2015

5. Aljofey, A., Jiang, Q., Qu, Q., Huang, M., and Niyigena, J.-P. (2020). An effective phishing detection model based on character level convolutional neural network from URL. Electronics, 9.