Author:
Wang Yilin,Xue Siqing,Song Jun
Abstract
In recent years, with the rapid development of the Internet and information technology, video websites, shopping websites, and other portals have grown rapidly. However, malicious webpages can disguise themselves as benign websites and steal users’ private information, which seriously threatens network security. Current detection methods for malicious webpages do not fully utilize the syntactic and semantic information in the web source code. In this paper, we propose a GCN-based malicious webpage detection method (GMWD), which constructs a text graph to describe and then a GCN model to learn the syntactic and semantic correlations within and between webpage source codes. We replace word nodes in the text graph with phrase nodes to better maintain the syntactic and semantic integrity of the webpage source code. In addition, we use the URL links appearing in the source code as auxiliary detection information to further improve the detection accuracy. The experiments showed that the proposed method can achieve 99.86% accuracy and a 0.137% false negative rate, achieving a better performance than other related malicious webpage detection methods.
Subject
General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)