Website link detection is an important means to ensure the security of the external chain. In the past, it was mainly realized through blacklisting and feature engineering-based machine learning, which has the problems of slow detection speed and weak model generalization ability. The development of neural networks has brought a new solution to the security detection of the external chain of the website. To address the performance bottleneck caused by the variable content length of web pages, this article introduces an innovative approach: a website external link security detection algorithm based on multi-modal fusion. It extracts text, dynamic script, and image features separately, and constructs a deep fusion model that combines these multi-modal features. Compared with the previous research results, the proposed method is superior to the traditional single-mode method, and can quickly and accurately identify malicious web pages. The accuracy and F1 value are improved by 2.7% and 0.026.